Create README.md

2015-06-09 23:00:58 +03:00
parent 393665f08a
commit 6ca2b27781
1 changed files with 70 additions and 0 deletions
@@ -0,0 +1,70 @@
+# Russian Morphology for lucene
+
+Russian and English morphology for java and lucene 3.0 framework based on open source dictionary from site [АОТ](http://aot.ru). It use dictionary base morphology with some heuristics for unknown words. It support homonym for example for Russian word "вина" it gives two variants "вино" and "вина". 
+
+
+### How to use
+
+First download 
+[morph-1.0.jar](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/morph/1.1/morph-1.1.jar)  
+and add it to your class path. When download [Russian](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/russian/1.1/russian-1.1.jar) or 
+[English](https://bintray.com/artifact/download/akuznetsov/russianmorphology/org/apache/lucene/morphology/english/1.1/english-1.1.jar) package. 
+
+If you use maven you can add dependency 
+
+        <dependency>
+            <groupId>org.apache.lucene.morphology</groupId>
+            <artifactId>russian</artifactId>
+            <version>1.1</version>
+        </dependency>
+
+
+        <dependency>
+            <groupId>org.apache.lucene.morphology</groupId>
+            <artifactId>english</artifactId>
+            <version>1.1</version>
+        </dependency>
+
+Don't forget add link to repository
+
+
+    <repositories>
+    ...............
+      <repository>
+        <snapshots>
+          <enabled>false</enabled>
+        </snapshots>
+        <id>bintray-akuznetsov-russianmorphology</id>
+        <name>bintray</name>
+        <url>http://dl.bintray.com/akuznetsov/russianmorphology</url>
+      </repository>
+    </repositories>
+
+
+
+Now you can create a Lucene Analyzer 
+
+
+      RussianAnalayzer russian = new RussianAnalayzer();
+      EnglishAnalayzer english = new EnglishAnalayzer();
+
+
+You can write you own analyzer using filter that convert word in it's right forms. 
+
+      LuceneMorphology luceneMorph = new EnglishLuceneMorphology();
+      TokenStream tokenStream = new MorphlogyFilter(result, luceneMorph);
+
+Because usually LuceneMorphology contains a lot data needing for it functionality, it is better didn't create this object for each MorphologyFilter.
+
+Also if you need get a list of base forms of word, you can use following example 
+
+
+     LuceneMorphology luceneMorph = new EnglishLuceneMorphology();
+     List<String> wordBaseForms = luceneMorph.getMorphInfo(word);
+
+
+### Restrictions
+  
+  * It works only with UTF-8.
+  * It assume what letters е and ё are the same.
+  * Word forms with prefixes like "наибольший" treated as separate word.