简体   繁体   中英

Repeated Phrases in text Python _ Follow up

Another user already opened the discussion on how to find repeated phrases in Python, but focusing only on phrases of three words.

The answer of Robert Rossney was complete and working (it is here repeated phrases in the text Python ) , but can I ask for a method that simply finds repeated phrases, notwithstanding their length? I think it is possible to elaborate on the method already elaborated in the previous discussion, but I am not pretty sure on how to do it.

I think this is the function that might be modified in order to return tuples of different lenght:

def phrases(words):
    phrase = []
    for word in words:
        phrase.append(word)
        if len(phrase) > 3:
            phrase.remove(phrase[0])
        if len(phrase) == 3:
            yield tuple(phrase)

One simple modification is to pass word length to phrases method and then call the method with different word lengths.

def phrases(words, wlen):
  phrase = []
  for word in words:
    phrase.append(word)
    if len(phrase) > wlen:
        phrase.remove(phrase[0])
    if len(phrase) == wlen:
        yield tuple(phrase)

And then define all_phrases as

def all_phrases(words):
   for l in range(1, len(words)):
      yield phrases(words, l)

And then one way of using it is

for w in all_phrases(words):
   for g in w:
     print g

For words = ['oer', 'the', 'bright', 'blue', 'sea'] , it produces:

('oer',)
('the',)
('bright',)
('blue',)
('sea',)
('oer', 'the')
('the', 'bright')
('bright', 'blue')
('blue', 'sea')
('oer', 'the', 'bright')
('the', 'bright', 'blue')
('bright', 'blue', 'sea')
('oer', 'the', 'bright', 'blue')
('the', 'bright', 'blue', 'sea')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM