Ajax and Character Sets

I was recently adding some dynamic functionality to my current webapp project and I must say rails’ AJAX support is really slick. However, I had some issues with character set encodings and I want to share my findings here.

Since my webapp is for a German audience only, I’m using the ISO-8859-15 character set encoding for all my pages (that is latin-1 + euro sign). So far, I have been setting the correct encoding in the HTML I’m producing and everything worked fine. You could even enter special German characters (like “Ä” or “ß”) in forms without any problems.

As soon as I added AJAX support, however, funny characters started to appear in two places:

  1. When serving HTML snippets that containend special characters, they were not displayed properly
  2. When users entered special characters in HTML forms they were stored in the database as multi-byte sequences and not being displayed properly later.

Before digging into the issues, one comment: The recommended approach to fix all of this is to simply use UTF-8 everywhere. This was not an option for me, though, because I have some old programs not understanding that need to use the data, too, so I decided to try sticking to ISO-8859-15.

The first issue was adressed quickly: I was setting the character encoding for my pages in an HTML “meta” header. Obvisouly, this header wasn’t send in the HTML-snippets AJAX is serving, so the browser had no idea what encoding was used for the snippet. Setting this in the HTTP-header helped. One way to do this is with a before-filter in application.rb:


before_filter :set_charset
def set_charset
    @headers["Content-Type"] = "text/html; charset=ISO-8859-15" 
end

The other issue was more tricky. It seemed that submissions of forms via AJAX came in UTF-8 encoded while normale form submissions came in ISO-8859-15 encoded. Since I always want to know, why something works the way it works, I did some research to find out, which character encoding is used in HTTP request from the client to the server. I found this great page about form submission and i18n. To sum it up briefly:

  1. There is no header in the HTTP request telling you the character encoding used
  2. POST requests will always (i.e. most of the time in standard compliant browser) use the same encoding that was used to server the page.
  3. GET requests are not allowed to use non-ASCII characters. Browsers don’t care about this, however and use the same strategy as for POST requests, though.

These rules make sense and explained why everything worked fine so far. It can get tricky if you serve pages in different encodings, though, but I won’t go into this (simply use UTF-8 in this case).

Now back to AJAX. Why didn’t it work there? Well, the JavaScript serializes all the fields and it always uses url-encoded UTF-8 strings for this! So I wrote another before filter that converted these UTF-8 strings back to the encoding I expected, but only did this for actions that were called via AJAX. It looks like this in application.rb:


before_filter :convert_charset

private

def convert_charset
    @params.iconv!("iso-8859-1", "utf-8") if
        ajax_actions.include?(@params["action"])
end

def ajax_actions
    []
end

In my real controller, I simply overwrite ajax_actions to return a list of action names for which the conversion should be done. Note that I’m using ruby’s open-class principle to add the iconv! method to Hash. I put the following code into an extra file that I include from environment.rb upon startup.


require 'iconv'
class Hash
    def iconv!(to,from)
        iconv = Iconv.new(to,from)
        perform_iconv!(iconv)
        iconv.close
    end

    def perform_iconv!(iconv)
        each_pair do |key,value|
            case value
                when String
                    self[key] = iconv.iconv(value)
                when Hash
                    value.perform_iconv!(iconv)
            end
        end
    end
end

That’s it! Not too bad after all. But if you can, always you UTF-8 and you won’t have these problems.

If you want to learn more about i18n issues and character encodings, the W3C has a good coverage of I18N topics, including information about character encodings and form submissions.