[英]Counting the occurrence of words in a dataframe column using a list of strings
我有一個字符串列表和一個帶有文本列的 dataframe。 在文本列中,我有幾行文本。 我想計算字符串列表中每個單詞在文本列中出現的次數。 我的目標是在 dataframe 中添加兩列; 一列包含單詞,另一列包含出現次數。 如果有更好的解決方案,我願意接受。 學習不同的方法來實現這一點會很棒。 理想情況下,我最終想要一台 dataframe。
string_list = ['had', 'it', 'the']
當前 dataframe:
Dataframe 代碼:
pd.DataFrame({'title': {0: 'book1', 1: 'book2', 2: 'book3', 3: 'book4', 4: 'book5'},
'text': {0: 'His voice had never sounded so cold',
1: 'When she arrived home, she noticed that the curtains were closed.',
2: 'He was terrified of small spaces and she knew',
3: "It was time. She'd fought against it for so long",
4: 'As he took in the view from the twentieth floor, the lights went out all over the city'},
'had': {0: 1, 1: 5, 2: 5, 3: 2, 4: 5},
'it': {0: 1, 1: 3, 2: 2, 3: 1, 4: 2},
'the': {0: 1, 1: 4, 2: 5, 3: 3, 4: 3}})
嘗試像這樣獲得 dataframe:
Function 查找給定模式的匹配數:
def find_match_count(word: str, pattern: str) -> int:
return len(re.findall(pattern, word.lower()))
然后遍歷每個字符串,並將此 function 應用於'word'
列:
for col in string_list:
df[col] = df['text'].apply(find_match_count, pattern=col)
使用您提供的數據框時(沒有 had、it 和列)給出:
title text had it the
0 book1 His voice had never sounded so cold 1 0 0
1 book2 When she arrived home, she noticed that the cu... 0 0 1
2 book3 He was terrified of small spaces and she knew 0 0 0
3 book4 It was time. She'd fought against it for so long 0 2 0
4 book5 As he took in the view from the twentieth floo... 0 1 4
定義自定義正則表達式、 extractall
、 join
和melt
:
regex = '|'.join(fr'(?P<{w}>\b{w}\b)' for w in string_list)
(df[['title', 'text']]
.join(df['text'].str.extractall(regex).notna().groupby(level=0).sum())
.fillna(0)
.melt(id_vars=['title', 'text'], var_name='word', value_name='word count')
)
Output:
title text word word count
0 book1 His voice had never sounded so cold had 1.0
1 book2 When she arrived home, she noticed that the cu... had 0.0
2 book3 He was terrified of small spaces and she knew had 0.0
3 book4 It was time. She'd fought against it for so long had 0.0
4 book5 As he took in the view from the twentieth floo... had 0.0
5 book1 His voice had never sounded so cold it 0.0
6 book2 When she arrived home, she noticed that the cu... it 0.0
7 book3 He was terrified of small spaces and she knew it 0.0
8 book4 It was time. She'd fought against it for so long it 1.0
9 book5 As he took in the view from the twentieth floo... it 0.0
10 book1 His voice had never sounded so cold the 0.0
11 book2 When she arrived home, she noticed that the cu... the 1.0
12 book3 He was terrified of small spaces and she knew the 0.0
13 book4 It was time. She'd fought against it for so long the 0.0
14 book5 As he took in the view from the twentieth floo... the 4.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.