简体   繁体   中英

OpenFST - creating FST's from list of words

I'm reading the top example: http://www.openfst.org/twiki/bin/view/FST/FstExamples about tokenization.

In the example, they create three fsts: Mars.fst , Martian.fst , and man.fst , and manually run some fst commands to merge them into one big transducer. They get the word "Mars", "Martian", and "man" from wotw.syms , which has 7102 words.

My question is, is there a smart way to create a word.fst for all 7102 words, so that all 7102 words can be made into one big automata, or does it have to be done manually, like they did for the three word Martian, Mars, and man?

They gave a script: https://www.openfst.org/twiki/pub/FST/FstExamples/makelex.py.txt We may simply:

cat wotw.syms | python2 makelex.py > lexicons_text.fst
fstcompile --isymbols=ascii.syms --osymbols=wotw.syms lexicon_text.fst lexicon.fst
fstrmepsilon lexicon.fst | fstdeterminize | fstminimize >lexicon_opt.fst

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM