matching words between 2 csv

Question

I have a two csv files and I want to count every word in a row that matches with a word from the second file. For example:

text.csv

    text                                         number
0   very nice house, and great garden               3
1   the book is very boring                         4
2   it was very interesting final end               5
3   I have no idea which    book do you prefer      4

words.csv

       word              score
0      boring           -1.0
1      very             -1.0
2      interesting       1.0
3      great             1.0
4      book              0.5

I want to count the words that matches with the second file and get the following output:

[[2,3], [3,4], [2,5], [1,4]]

For example, in [2,3] is the number of matched words (very, great) and 3 is the number. What I tried is

matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
    for item in words:
        if word== item:
            matches.append([1,1])

Answer 1

Let us try str.count to count occurrences of words from df2 in each string of the column text in df1 :

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))

>>> matches

[(2, 3), (3, 4), (2, 5), (1, 4)]

Answer 2

This solution worked for me:-)

counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))

matching words between 2 csv

Question

2 answers

solution1
0 ACCPTED 2021-02-24 09:50:28

solution2
0 2021-03-03 11:17:38

matching words between 2 csv

Question

2 answers

solution1 0 ACCPTED 2021-02-24 09:50:28

solution2 0 2021-03-03 11:17:38

solution1
0 ACCPTED 2021-02-24 09:50:28

solution2
0 2021-03-03 11:17:38