简体   繁体   中英

Is there a way to keep count of number of occurrences of word pairs in a list of sentences?

I have a list of word pairs and I have to check whether each of these word pairs occur in each sentence in a list of sentences. For example: The list of word pairs is something like this:

[(mary,little),(mary,lamb),(mary,jack),(mary,jill),(little,lamb),(little,Jack),(little,Jill),(lamb,jack),(lamb,jill),(jack,jill)]

The list of sentences are:

['Mary had a little lamb','Jack and Jill went up the hill']

The output should be such that for each sentence the number of occurrences of each word pair is counted. In this example, the first sentence will have the counts of word pairs as

[(mary,little):1,(mary,lamb):1,(mary,jack):0,(mary,jill):0,(little,lamb):1,(little,Jack):0,(little,Jill):0,(lamb,jack):0,(lamb,jill):0,(jack,jill):0]

Similarly for the second sentence. The output can also be presented as a tabular form with sentences as one column and the word pairs as the other columns.

The pairs need to be strings as well:

pairs = [('mary','little'),('mary','lamb'),('mary','jack'),('mary','jill'),('little','lamb'),('little','Jack'),('little','Jill'),('lamb','jack'),('lamb','jill'),('jack','jill')]

sents = ['Mary had a little lamb','Jack and Jill went up the hill']

You will want to make the sentence you select lower case using the .lower() function because your word matching is case insensitive. Then you can use a list comprehension to create a list of pairs where both words are in the sentence.

sent = sents[0].lower()
matches = [s for s in pairs if s[0] in sent and s[1] in sent]

d = {}

for p in pairs:
    if p in matches:
        d[p] = 1
    else:
        d[p] = 0

print(d)

If you are a fan of one-liners you can use it:

m_str = 'Mary had a little lamb'
m_dict = {pair:count for pair,count in zip(a, [1 if (item[0] in m_str.lower().split(' ') and item[1] in m_str.lower().split(' ')) else 0 for item in a])}

output:

{('mary', 'little'): 1, ('mary', 'lamb'): 1, ('mary', 'jack'): 0, ('mary', 'jill'): 0, ('little', 'lamb'): 1, ('little', 'Jack'): 0, ('little', 'Jill'): 0, ('lamb', 'jack'): 0, ('lamb', 'jill'): 0, ('jack', 'jill'): 0}

this solution will bring only 1 and 0, if appeared or not correspondly. didn't completely understand what counter of pairs means (for example lets say - twice the first word and only once the second - how much will that count?), but any modifications in requirements will lead to little modifications in code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM