繁体   English   中英

2 csv 之间的匹配词

[英]matching words between 2 csv

我有两个 csv 文件,我想计算与第二个文件中的单词匹配的行中的每个单词。 例如:

text.csv

    text                                         number
0   very nice house, and great garden               3
1   the book is very boring                         4
2   it was very interesting final end               5
3   I have no idea which    book do you prefer      4

words.csv

       word              score
0      boring           -1.0
1      very             -1.0
2      interesting       1.0
3      great             1.0
4      book              0.5

我想统计与第二个文件匹配的单词,得到如下output:

[[2,3], [3,4], [2,5], [1,4]] 

例如,在[2,3]中是匹配词的数量(very, great) ,而3是数量。 我尝试的是

matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
    for item in words:
        if word== item:
            matches.append([1,1])

让我们尝试str.count来计算df1中列text的每个字符串中来自df2words的出现次数:

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))

>>> matches

[(2, 3), (3, 4), (2, 5), (1, 4)]

这个解决方案对我有用:-)

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM