简体   繁体   中英

make new sentence with a n-gram model using nltk

I made 2 and 3-gram models from my text file.

from nltk import *
text = open('Alice in Wonderland.txt', 'r').read()
table = string.maketrans('', '')
text = text.translate(table, string.punctuation)
tokens = word_tokenize(text.lower())
bigram = nltk.bigrams(tokens)
trigram = nltk.trigrams(tokens)

but how can I generate new sentences using these models?

Currently, NLTK's generate() function is being deprecated because it is broken, see https://github.com/nltk/nltk/issues/1180

But a state-of-art alternative is text generation using Recurrent Neural Nets, eg https://github.com/karpathy/char-rnn ( Note : Unlike traditional Ngram based hidden markov model, the char-RNN doesn't use ngrams information.)

Alternatively you can implement your own hidden markov model, see http://fulmicoton.com/posts/shannon-markov/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM