torsdag den 14. oktober 2010

Splitting values to multivalued fields in solr

When you need to split values into a multi value field (e.g. when crawling websites and the meta keywords should end up as separate values and not just a long comma separated string) you can use this fieldtype:



<fieldType name="semicolonDelimited" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\|" />
</analyzer>
</fieldType>


One thing that I discovered was that when looking at the values in the search result xml, the field output was formattet exactly like the input. So it looks like the tokenization has failed. But when looking into the schema browser you can see that the values are separated nicely.

Ingen kommentarer:

Send en kommentar