I have a bunch of id's along with their sentence. I need to compare this data with a list of words. I want my output in such a way that I get the ID and the respective words from the sentence that matches the list of words.
I tried to do them in Excel, by doing text to columns and then transposing the list then doing a conditional formatting. But Its really not possible as the sentence that has so many words at time, and there are a lot of sentences.
Is there a way I can do them by python programming
Input:
| ID | data | | List |
|----|----------------------| . hello
| 1 | hello can you hear me| . love
| 2 | roses are red | . water
| 3 | water is life | . roses
| 4 | pie | . pie
| 5 | I love chicken pie | . chicken
|----|----------------------| . hear
red
Expected Output:
| ID | data |
|----|--------|
| 1 | hello |
| 1 | hear |
| 2 | roses |
| 2 | red |
| 3 | water |
| 4 | pie |
| 5 | love |
| 5 | chicken|
| 5 | pie |
Supposing you have a csv table of IDs and sentences sentences.csv
, and a text file with a list of words words.txt
, you could do the following:
import csv
words = set(l.strip() for l in open('words.txt'))
table = []
with open('sentences.csv') as f:
for sid,sentence in csv.reader(f):
table += [[word, sid] for word in sentence.split() if word in words]
csv.writer(sys.stdout).writerows(table)
This is a compact way to express this, and does not do much int he way of error checking. For example, if some lines in the csv file do not have 2 cells in it, then the loop will crash. Even more briefly, one could express the table parsing as such:
table = [[word,sid] for sid,sentence in csv.reader(open('sentences.csv'))
for word in sentence.split() if word in words]
Both give the expected output
hello,1
hear,1
roses,2
red,2
water,3
pie,4
love,5
chicken,5
pie,5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.