I have created a token filter for Apache Lucene that works pretty well. You find the code here: http://issues.apache.org/jira/browse/LUCENE-1166.
It works like I described in the older blog entries.
I also created a Swedish hyphenation grammar. It is attached to the blog entry.
Update [2008-03-09]:
I have posted the Swedish grammar to the OFFO-project for inclusion in the next release. The patch is available here: http://sourceforge.net/tracker/index.php?func=detail&aid=1906166&group_id=116740&atid=678288
Update [2008-04-18]:
I am experimenting with replacing the HashMap dictionary lookup with a Lucene index lookup. I have no numbers so far that would show a speedup for the dumb compound word token filter.
| Attachment | Size |
|---|---|
| se.xml | 31.49 KB |



Post new comment