IPADIC dictionary builder for Lindera. This project fork from fulmicoton's kuromoji-rs.
shell script
% cargo install lindera-ipadic-builder
The following products are required to build:
shell script
% cargo build --release
This repository contains mecab-ipadic-2.7.0-20070801.
Building a dictionary with lindera-ipadic
command:
shell script
% curl -L -O "http://jaist.dl.sourceforge.net/project/mecab/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz"
% tar zxvf ./mecab-ipadic-2.7.0-20070801.tar.gz
% lindera-ipadic ./mecab-ipadic-2.7.0-20070801 ./lindera-ipadic-2.7.0-20070801
Refer to the manual for details on the IPADIC dictionary format and part-of-speech tags.
| Index | Name (Japanese) | Name (English) | Notes | | --- | --- | --- | --- | | 0 | 品詞 | part-of-speech | | | 1 | 品詞細分類1 | sub POS 1 | | | 2 | 品詞細分類2 | sub POS 2 | | | 3 | 品詞細分類3 | sub POS 3 | | | 4 | 活用形 | conjugation type | | | 5 | 活用型 | conjugation form | | | 6 | 原形 | base form | | | 7 | 読み | reading | | | 8 | 発音 | pronunciation | |
You can tokenize text using produced dictionary with lindera
command:
shell script
% echo "羽田空港限定トートバッグ" | lindera -d ./lindera-ipadic-2.7.0-20070801
text
羽田空港 名詞,固有名詞,一般,*,*,*,羽田空港,ハネダクウコウ,ハネダクーコー
限定 名詞,サ変接続,*,*,*,*,限定,ゲンテイ,ゲンテイ
トートバッグ UNK,*,*,*,*,*,*,*,*
EOS
For more details about lindera
command, please refer to the following URL:
The API reference is available. Please see following URL: - lindera-ipadic-builder