LUCENE-1166 (A tokenfilter to decompose compound words) has been committed today. Hooray...
peuss.de |
|
Committed! (LUCENE-1166) A tokenfilter to decompose compound words
Submitted by Thomas on Fri, 2008-05-16 13:57
LUCENE-1166 (A tokenfilter to decompose compound words) has been committed today. Hooray... |
NavigationUser login |
Hi! I tried to use your
Hi!
I tried to use your DictionaryCompoundWordTokenFilter, but I have some problems...
First, I´m not sure how to use a dictionary. I tried
Set dic = WordlistLoader.getWordSet(new File("de_DE.dic")); with de_DE.dic beeing an openSource dictionary. It seems to work, but the items contain more information than just the word, eg. "kampfgeist/STozm". Is this like it should be? Or could you give me some hint, where to find how to use it?
Then I run into an error that says:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.analysis.Token.termLength()I
at org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase.decompose(CompoundWordTokenFilterBase.java:161)
Thats very strange, because termLength() is a known method! And the actual (first) token is (ratz,0,4,type=), all normal.
Thank You for a reply,
JanaT
Dictionaries
Hi!
Your dictionaries should only contain words and no extra information. Here an example:
Hallo
Welt
Spaß
Geht
Sommer
For the problem with termLength: maybe you have an old Lucene version in your CLASSPATH (or server lib dir).
CU
Thomas
Need help for Using your DictionaryCompoundWordTokenFilter
After i've updated to hibernate-search 3.1 and lucene 2.4.0 i tried to use your 'DictionaryCompoundWordTokenFilter'
I have nerver found an example anywhere for using your filter. Here is my code:
I have downloaded the file de.xml and the dtd file. Maybe anyone can help?
public class GermanWordTokenAnalyzer extends Analyzer {
@Override
@SuppressWarnings( "unchecked" )
public TokenStream tokenStream( String fieldName, Reader reader ) {
TokenStream result = null;
try {
Set dic = WordlistLoader.getWordSet( new File( "/lucene/de.xml" ) );
result = new DictionaryCompoundWordTokenFilter( new StandardTokenizer( reader ), dic );
return result;
} catch( Exception e ) {
// TODO: handle exception
}
return result;
}
}
Hans
Examples for the compound word filter
Hi Hans!
You find examples in the unit test for the filter:
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/analyzers/src/tes...
and in the Javadoc for the filter:
http://lucene.apache.org/java/2_4_0/api/contrib-analyzers/org/apache/luc...
CU
Thomas
Post new comment