[英]matching words between 2 csv
我有兩個 csv 文件,我想計算與第二個文件中的單詞匹配的行中的每個單詞。 例如:
text.csv
text number
0 very nice house, and great garden 3
1 the book is very boring 4
2 it was very interesting final end 5
3 I have no idea which book do you prefer 4
words.csv
word score
0 boring -1.0
1 very -1.0
2 interesting 1.0
3 great 1.0
4 book 0.5
我想統計與第二個文件匹配的單詞,得到如下output:
[[2,3], [3,4], [2,5], [1,4]]
例如,在[2,3]
中是匹配詞的數量(very, great)
,而3
是數量。 我嘗試的是
matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
for item in words:
if word== item:
matches.append([1,1])
讓我們嘗試str.count
來計算df1
中列text
的每個字符串中來自df2
的words
的出現次數:
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))
>>> matches
[(2, 3), (3, 4), (2, 5), (1, 4)]
這個解決方案對我有用:-)
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.