torsdag den 14. oktober 2010

Danish letters in Solr on Tomcat

In order to have danish/swedish/norwegian (or other special characters) made searchable in solr I figured out that this was the way to go:

  1. Make sure that your solr conf folder contains this file: mapping-ISOLatin1Accent.txt
    This file has mappings from special chars to non-special chars: (å => aa, æ => ae, ø => o (oe) etc.)
  2. In your schema.xml in your field type definition (e.g. text) the first analyzer in the analyzer chain should be the mapper.

    This goes for both index-analyzer and query-analyzer.
    At this point I thought that all was fine an dandy... But not! When searching for danish letters in the solr admin interface the letters were not correct encoded.
  3. The fix (setting) was found in a tomcat config file called: server.xml (Tomcat x.x/conf/server.xml)
    This file has Service node that has a connector node. This node was missing the URIEncoding attribute. So I added the attribute and my node was now:

  4. <Connector port="8080" protocol="HTTP/1.1"
  5. URIEncoding="UTF-8"
  6. connectionTimeout="20000"
  7. redirectPort="8443" />


  8. A quick restart of the server and some IE cache clearing later and the problem was solved!


Ingen kommentarer:

Send en kommentar