Create README.md

This commit is contained in:
Alexander Kuznetsov 2015-06-09 23:00:58 +03:00
parent 393665f08a
commit 6ca2b27781

70
README.md Normal file
View File

@ -0,0 +1,70 @@
# Russian Morphology for lucene
Russian and English morphology for java and lucene 3.0 framework based on open source dictionary from site [АОТ](http://aot.ru). It use dictionary base morphology with some heuristics for unknown words. It support homonym for example for Russian word "вина" it gives two variants "вино" and "вина".
### How to use
First download
[morph-1.0.jar](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/morph/1.1/morph-1.1.jar)
and add it to your class path. When download [Russian](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/russian/1.1/russian-1.1.jar) or
[English](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/english/1.1/english-1.1.jar) package.
If you use maven you can add dependency
<dependency>
<groupId>org.apache.lucene.morphology</groupId>
<artifactId>russian</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>org.apache.lucene.morphology</groupId>
<artifactId>english</artifactId>
<version>1.1</version>
</dependency>
Don't forget add link to repository
<repositories>
...............
<repository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>bintray-akuznetsov-russianmorphology</id>
<name>bintray</name>
<url>http://dl.bintray.com/akuznetsov/russianmorphology</url>
</repository>
</repositories>
Now you can create a Lucene Analyzer
RussianAnalayzer russian = new RussianAnalayzer();
EnglishAnalayzer english = new EnglishAnalayzer();
You can write you own analyzer using filter that convert word in it's right forms.
LuceneMorphology luceneMorph = new EnglishLuceneMorphology();
TokenStream tokenStream = new MorphlogyFilter(result, luceneMorph);
Because usually LuceneMorphology contains a lot data needing for it functionality, it is better didn't create this object for each MorphologyFilter.
Also if you need get a list of base forms of word, you can use following example
LuceneMorphology luceneMorph = new EnglishLuceneMorphology();
List<String> wordBaseForms = luceneMorph.getMorphInfo(word);
### Restrictions
* It works only with UTF-8.
* It assume what letters е and ё are the same.
* Word forms with prefixes like "наибольший" treated as separate word.