[英]matching words between 2 csv
I have a two csv files and I want to count every word in a row that matches with a word from the second file.我有两个 csv 文件,我想计算与第二个文件中的单词匹配的行中的每个单词。 For example:例如:
text.csv text.csv
text number
0 very nice house, and great garden 3
1 the book is very boring 4
2 it was very interesting final end 5
3 I have no idea which book do you prefer 4
words.csv words.csv
word score
0 boring -1.0
1 very -1.0
2 interesting 1.0
3 great 1.0
4 book 0.5
I want to count the words that matches with the second file and get the following output:我想统计与第二个文件匹配的单词,得到如下output:
[[2,3], [3,4], [2,5], [1,4]]
For example, in [2,3]
is the number of matched words (very, great)
and 3
is the number.例如,在[2,3]
中是匹配词的数量(very, great)
,而3
是数量。 What I tried is我尝试的是
matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
for item in words:
if word== item:
matches.append([1,1])
Let us try str.count
to count occurrences of words
from df2
in each string of the column text
in df1
:让我们尝试str.count
来计算df1
中列text
的每个字符串中来自df2
的words
的出现次数:
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))
>>> matches
[(2, 3), (3, 4), (2, 5), (1, 4)]
This solution worked for me:-)这个解决方案对我有用:-)
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.