简体   繁体   中英

How to match a list to a sentence and bring the word list with id - words format

I have a bunch of id's along with their sentence. I need to compare this data with a list of words. I want my output in such a way that I get the ID and the respective words from the sentence that matches the list of words.

I tried to do them in Excel, by doing text to columns and then transposing the list then doing a conditional formatting. But Its really not possible as the sentence that has so many words at time, and there are a lot of sentences.

Is there a way I can do them by python programming

Input:

 | ID | data                 |    | List |
 |----|----------------------| .   hello
 | 1  | hello can you hear me| .   love
 | 2  | roses are red        | .   water
 | 3  | water is life        | .   roses
 | 4  | pie                  | .   pie
 | 5  | I love chicken pie   | .   chicken
 |----|----------------------| .   hear
                                   red

Expected Output:

 | ID | data   |
 |----|--------|
 | 1  | hello  |
 | 1  | hear   |
 | 2  | roses  |
 | 2  | red    |
 | 3  | water  |
 | 4  | pie    |
 | 5  | love   |
 | 5  | chicken|
 | 5  | pie    |

Supposing you have a csv table of IDs and sentences sentences.csv , and a text file with a list of words words.txt , you could do the following:

import csv

words = set(l.strip() for l in open('words.txt'))
table = []
with open('sentences.csv') as f:
    for sid,sentence in csv.reader(f):
        table += [[word, sid] for word in sentence.split() if word in words]
csv.writer(sys.stdout).writerows(table)

This is a compact way to express this, and does not do much int he way of error checking. For example, if some lines in the csv file do not have 2 cells in it, then the loop will crash. Even more briefly, one could express the table parsing as such:

 table = [[word,sid] for sid,sentence in csv.reader(open('sentences.csv'))
                     for word in sentence.split() if word in words]

Both give the expected output

hello,1
hear,1
roses,2
red,2
water,3
pie,4
love,5
chicken,5
pie,5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM