2 csv 之間的匹配詞

Question

我有兩個 csv 文件，我想計算與第二個文件中的單詞匹配的行中的每個單詞。 例如：

text.csv

    text                                         number
0   very nice house, and great garden               3
1   the book is very boring                         4
2   it was very interesting final end               5
3   I have no idea which    book do you prefer      4

words.csv

       word              score
0      boring           -1.0
1      very             -1.0
2      interesting       1.0
3      great             1.0
4      book              0.5

我想統計與第二個文件匹配的單詞，得到如下output：

[[2,3], [3,4], [2,5], [1,4]]

例如，在[2,3]中是匹配詞的數量(very, great) ，而3是數量。 我嘗試的是

matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
    for item in words:
        if word== item:
            matches.append([1,1])

Answer 1

讓我們嘗試str.count來計算df1中列text的每個字符串中來自df2的words的出現次數：

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))

>>> matches

[(2, 3), (3, 4), (2, 5), (1, 4)]

Answer 2

這個解決方案對我有用:-)

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))

2 csv 之間的匹配詞

問題描述

2 個解決方案

解決方案1
0 已采納 2021-02-24 09:50:28

解決方案2
0 2021-03-03 11:17:38

2 csv 之間的匹配詞

問題描述

2 個解決方案

解決方案1 0 已采納 2021-02-24 09:50:28

解決方案2 0 2021-03-03 11:17:38

解決方案1
0 已采納 2021-02-24 09:50:28

解決方案2
0 2021-03-03 11:17:38