From 6ca2b27781a2aa30d225f00b420d03f9d6f0f42a Mon Sep 17 00:00:00 2001 From: Alexander Kuznetsov Date: Tue, 9 Jun 2015 23:00:58 +0300 Subject: [PATCH] Create README.md --- README.md | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..2985d35 --- /dev/null +++ b/README.md @@ -0,0 +1,70 @@ +# Russian Morphology for lucene + +Russian and English morphology for java and lucene 3.0 framework based on open source dictionary from site [АОТ](http://aot.ru). It use dictionary base morphology with some heuristics for unknown words. It support homonym for example for Russian word "вина" it gives two variants "вино" and "вина". + + +### How to use + +First download +[morph-1.0.jar](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/morph/1.1/morph-1.1.jar) +and add it to your class path. When download [Russian](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/russian/1.1/russian-1.1.jar) or +[English](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/english/1.1/english-1.1.jar) package. + +If you use maven you can add dependency + + + org.apache.lucene.morphology + russian + 1.1 + + + + + org.apache.lucene.morphology + english + 1.1 + + +Don't forget add link to repository + + + + ............... + + + false + + bintray-akuznetsov-russianmorphology + bintray + http://dl.bintray.com/akuznetsov/russianmorphology + + + + + +Now you can create a Lucene Analyzer + + + RussianAnalayzer russian = new RussianAnalayzer(); + EnglishAnalayzer english = new EnglishAnalayzer(); + + +You can write you own analyzer using filter that convert word in it's right forms. + + LuceneMorphology luceneMorph = new EnglishLuceneMorphology(); + TokenStream tokenStream = new MorphlogyFilter(result, luceneMorph); + +Because usually LuceneMorphology contains a lot data needing for it functionality, it is better didn't create this object for each MorphologyFilter. + +Also if you need get a list of base forms of word, you can use following example + + + LuceneMorphology luceneMorph = new EnglishLuceneMorphology(); + List wordBaseForms = luceneMorph.getMorphInfo(word); + + +### Restrictions + + * It works only with UTF-8. + * It assume what letters е and ё are the same. + * Word forms with prefixes like "наибольший" treated as separate word.