I have a two csv files and I want to count every word in a row that matches with a word from the second file. For example:
text.csv
text number
0 very nice house, and great garden 3
1 the book is very boring 4
2 it was very interesting final end 5
3 I have no idea which book do you prefer 4
words.csv
word score
0 boring -1.0
1 very -1.0
2 interesting 1.0
3 great 1.0
4 book 0.5
I want to count the words that matches with the second file and get the following output:
[[2,3], [3,4], [2,5], [1,4]]
For example, in [2,3]
is the number of matched words (very, great)
and 3
is the number. What I tried is
matches=[]
text=df1['text'].str.split()
words=df2['word'].str.split()
for word in text:
for item in words:
if word== item:
matches.append([1,1])
Let us try str.count
to count occurrences of words
from df2
in each string of the column text
in df1
:
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b")
matches = list(zip(counts, df1['number']))
>>> matches
[(2, 3), (3, 4), (2, 5), (1, 4)]
This solution worked for me:-)
counts = df1['text'].str.count(fr"\b({'|'.join(df2['word'])})\b") convert_df1 = df1.applymap(lambda x: pd.to_numeric(x, errors='ignore')) matches = list(zip(counts, convert_df1['number']))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.