简体   繁体   English

如何计算单词出现次数(从特定列表中的单词)并将结果存储在 Python 中的 Pandas Dataframe 中的新列中?

[英]How to count the word occurence (from words in specific list) and store the results in a new column in a Pandas Dataframe in Python?

I currently have a list of words about MMA.我目前有一个关于 MMA 的单词列表。

I want to create a new column in my Pandas Dataframe called 'MMA Related Word Count'.我想在我的 Pandas Dataframe 中创建一个名为“MMA 相关字数统计”的新专栏。 I want to analyze the column 'Speech' for each row and sum up how often words (from the list under here) occurred within the speech.我想分析每一行的“语音”列,并总结语音中单词(来自此处列表)出现的频率。 Does anyone know the best way to do this?有谁知道最好的方法吗? I'd love to hear it, thanks in advance!我很想听听,在此先感谢!

Please take a look at my dataframe.请看我的dataframe。

CODE EXAMPLE:代码示例:

import pandas as pd

mma_related_words = ['mma', 'ju jitsu', 'boxing']

data = {
  "Name": ['Dana White', 'Triple H'],
  "Speech": ['mma is a fantastic sport. ju jitsu makes you better as a person.', 'Boxing sucks. Professional dancing is much better.']
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

CURRENT DATAFRAME:当前 DATAFRAME:

Name名称 Speech演讲
Dana White白大拿 mma is a fantastic sport. mma 是一项很棒的运动。 ju jitsu makes you better as a person.柔术让你成为一个更好的人。
Triple H三重H boxing sucks.拳击很烂。 Professional wrestling is much better.职业摔跤要好得多。

-- --

EXPECTED OUTPUT: Exactly same as above.预期 OUTPUT:与上面完全相同。 But at right side new column with 'MMA Related Word Count'.但在右侧的新列中有“MMA 相关字数统计”。 For Dana White: value 2. For Triple HI want value 1.对于 Dana White:值 2。对于 Triple HI,值 1。

You can use a regex with str.count :您可以将正则表达式与str.count一起使用:

import re
regex = '|'.join(map(re.escape, mma_related_words))
# 'mma|ju\\ jitsu|boxing'

df['Word Count'] = df['Speech'].str.count(regex, flags=re.I)
# or
# df['Word Count'] = df['Speech'].str.count(r'(?i)'+regex)

output: output:

         Name                                             Speech  Word Count
0  Dana White  mma is a fantastic sport. ju jitsu makes you b...           2
1    Triple H  Boxing sucks. Professional dancing is much bet...           1

Using simple loop in apply lambda function shall work;在 apply lambda function 中使用简单循环应该可以工作; Try this;尝试这个;

def fun(string):
    cnt = 0
    for w in mma_related_words:
        if w.lower() in string.lower():
            cnt = cnt + 1
    return cnt

df['MMA Related Word Count'] = df['Speech'].apply(lambda x: fun(string=x))

Same can also be written as;同样也可以写成;

df['MMA Related Word Count1'] = df['Speech'].apply(lambda x: sum([1 for w in mma_related_words if w.lower() in str(x).lower()]))

Output of df; df的Output;

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe:根据文本列中单词的出现计算ID数 - Pandas Dataframe: count number of IDs based on occurence of words in a text column 根据 Pandas 列表中出现的次数添加一个带有计数的新列 - Add a new column with count depending on occurence within a list in pandas Pandas:计算单词的出现次数(来自另一个数据帧),output 计数和匹配的单词 - Pandas: Count the occurence of words (from another dataframe), and output the count and matched words 计算数据框所有行的列表中单词的出现 - Count the occurence of words in a list of all rows of dataframe 如何在新列中的Pandas DataFrame行上存储迭代结果? - How to store the results of an iteration over rows of a Pandas DataFrame in a new column? 熊猫数据框中文本列中单词的频率计数并将其存储在其他列中 - frequency count of words in text column in pandas dataframe and store it in other column 计算列中特定数字的出现//将列数据转换为列表 - Count the occurence of specific number from column // convert column data to list 从python熊猫中的数据框中计算不同的单词 - Count distinct words from a dataframe in python pandas 如何从熊猫数据框中计算列表中的特定单词? - How to count specific words in a list from a panda dataframe? 计算 Pandas 列中的字数,将顶部 X 存储在新列中 - Count words in Pandas Column, store top X in a New Column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM