Committed! (LUCENE-1166) A tokenfilter to decompose compound words

LUCENE-1166 (A tokenfilter to decompose compound words) has been committed today. Hooray...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Hi! I tried to use your

Hi!
I tried to use your DictionaryCompoundWordTokenFilter, but I have some problems...
First, I´m not sure how to use a dictionary. I tried
Set dic = WordlistLoader.getWordSet(new File("de_DE.dic")); with de_DE.dic beeing an openSource dictionary. It seems to work, but the items contain more information than just the word, eg. "kampfgeist/STozm". Is this like it should be? Or could you give me some hint, where to find how to use it?

Then I run into an error that says:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.lucene.analysis.Token.termLength()I
at org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase.decompose(CompoundWordTokenFilterBase.java:161)

Thats very strange, because termLength() is a known method! And the actual (first) token is (ratz,0,4,type=), all normal.

Thank You for a reply,
JanaT

Dictionaries

Hi!

Your dictionaries should only contain words and no extra information. Here an example:

Hallo
Welt
Spaß
Geht
Sommer

For the problem with termLength: maybe you have an old Lucene version in your CLASSPATH (or server lib dir).

CU
Thomas

Need help for Using your DictionaryCompoundWordTokenFilter

After i've updated to hibernate-search 3.1 and lucene 2.4.0 i tried to use your 'DictionaryCompoundWordTokenFilter'
I have nerver found an example anywhere for using your filter. Here is my code:

I have downloaded the file de.xml and the dtd file. Maybe anyone can help?

public class GermanWordTokenAnalyzer extends Analyzer {

@Override
@SuppressWarnings( "unchecked" )
public TokenStream tokenStream( String fieldName, Reader reader ) {

TokenStream result = null;

try {
Set dic = WordlistLoader.getWordSet( new File( "/lucene/de.xml" ) );

result = new DictionaryCompoundWordTokenFilter( new StandardTokenizer( reader ), dic );

return result;

} catch( Exception e ) {
// TODO: handle exception
}

return result;
}

}

Hans

Examples for the compound word filter

Hi Hans!

You find examples in the unit test for the filter:
http://svn.apache.org/viewvc/lucene/java/trunk/contrib/analyzers/src/tes...

and in the Javadoc for the filter:
http://lucene.apache.org/java/2_4_0/api/contrib-analyzers/org/apache/luc...

CU
Thomas

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may post PHP code. You should include <?php ?> tags.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Images can be added to this post.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.