简体   繁体   中英

Find rhyme using NLTK in Python

I have a poem and I want the Python code to just print those words which are rhyming with each other.

So far I am able to:

  1. Break the poem sentences using wordpunct_tokenize()
  2. Clean the words by removing the punctuation marks
  3. Store the last word of each sentence of the poem in a list
  4. Generate another list using cmudict.entries() with elements as those last words and their pronunciation.

I am stuck with the next step. How should I try to match those pronunciations? In all, my major task is to find out if two given words rhyme or not. If rhyme, then return True , else False .

The Pronouncing library does a great job for that. No hacking, quick to load, and is based on the CMU Pronouncing Dictionary so it's reliable.

https://pypi.python.org/pypi/pronouncing

From their documentation :

>>> import pronouncing
>>> pronouncing.rhymes("climbing")
['diming', 'liming', 'priming', 'rhyming', 'timing']

Here I found a way to find rhymes to a given word using NLTK:

def rhyme(inp, level):
     entries = nltk.corpus.cmudict.entries()
     syllables = [(word, syl) for word, syl in entries if word == inp]
     rhymes = []
     for (word, syllable) in syllables:
             rhymes += [word for word, pron in entries if pron[-level:] == syllable[-level:]]
     return set(rhymes)

where inp is a word and level means how good the rhyme should be.

So you could use this function and to check if 2 words rhyme you could just check if one is in other's set of allowed rhymes:

def doTheyRhyme(word1, word2):
    # first, we don't want to report 'glue' and 'unglue' as rhyming words
    # those kind of rhymes are LAME
    if word1.find(word2) == len(word1) - len(word2):
        return False
    if word2.find(word1) == len(word2) - len(word1): 
        return False

    return word1 in rhyme(word2, 1)

Use soundex or double metaphone to find out if they rhyme. NLTK doesn't seem to implement these but a quick Google search showed some implementations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM