簡體   English   中英

使用字符串列表計算 dataframe 列中單詞的出現次數

[英]Counting the occurrence of words in a dataframe column using a list of strings

我有一個字符串列表和一個帶有文本列的 dataframe。 在文本列中,我有幾行文本。 我想計算字符串列表中每個單詞在文本列中出現的次數。 我的目標是在 dataframe 中添加兩列; 一列包含單詞,另一列包含出現次數。 如果有更好的解決方案,我願意接受。 學習不同的方法來實現這一點會很棒。 理想情況下,我最終想要一台 dataframe。

string_list = ['had', 'it', 'the']

當前 dataframe:

在此處輸入圖像描述

Dataframe 代碼:

pd.DataFrame({'title': {0: 'book1', 1: 'book2', 2: 'book3', 3: 'book4', 4: 'book5'},
 'text': {0: 'His voice had never sounded so cold',
  1: 'When she arrived home, she noticed that the curtains were closed.',
  2: 'He was terrified of small spaces and she knew',
  3: "It was time. She'd fought against it for so long",
  4: 'As he took in the view from the twentieth floor, the lights went out all over the city'},
 'had': {0: 1, 1: 5, 2: 5, 3: 2, 4: 5},
 'it': {0: 1, 1: 3, 2: 2, 3: 1, 4: 2},
 'the': {0: 1, 1: 4, 2: 5, 3: 3, 4: 3}})

嘗試像這樣獲得 dataframe:

在此處輸入圖像描述

Function 查找給定模式的匹配數:

def find_match_count(word: str, pattern: str) -> int:
    return len(re.findall(pattern, word.lower()))

然后遍歷每個字符串,並將此 function 應用於'word'列:

for col in string_list:
    df[col] = df['text'].apply(find_match_count, pattern=col)

使用您提供的數據框時(沒有 had、it 和列)給出:

   title                                               text  had  it  the
0  book1                His voice had never sounded so cold    1   0    0
1  book2  When she arrived home, she noticed that the cu...    0   0    1
2  book3      He was terrified of small spaces and she knew    0   0    0
3  book4   It was time. She'd fought against it for so long    0   2    0
4  book5  As he took in the view from the twentieth floo...    0   1    4

定義自定義正則表達式、 extractalljoinmelt

regex = '|'.join(fr'(?P<{w}>\b{w}\b)' for w in string_list)

(df[['title', 'text']]
 .join(df['text'].str.extractall(regex).notna().groupby(level=0).sum())
 .fillna(0)
 .melt(id_vars=['title', 'text'], var_name='word', value_name='word count')
 )

Output:

    title                                               text word  word count
0   book1                His voice had never sounded so cold  had         1.0
1   book2  When she arrived home, she noticed that the cu...  had         0.0
2   book3      He was terrified of small spaces and she knew  had         0.0
3   book4   It was time. She'd fought against it for so long  had         0.0
4   book5  As he took in the view from the twentieth floo...  had         0.0
5   book1                His voice had never sounded so cold   it         0.0
6   book2  When she arrived home, she noticed that the cu...   it         0.0
7   book3      He was terrified of small spaces and she knew   it         0.0
8   book4   It was time. She'd fought against it for so long   it         1.0
9   book5  As he took in the view from the twentieth floo...   it         0.0
10  book1                His voice had never sounded so cold  the         0.0
11  book2  When she arrived home, she noticed that the cu...  the         1.0
12  book3      He was terrified of small spaces and she knew  the         0.0
13  book4   It was time. She'd fought against it for so long  the         0.0
14  book5  As he took in the view from the twentieth floo...  the         4.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM