简体   繁体   English

计算文本列中列表中名称的出现次数

[英]Count occurrences of names from a list in a text columns

I have a df.Description Column, which has text such as我有一个df.Description列,其中包含诸如

"shirt XML tag t-shirt (Black) XL", "衬衫 XML 标签 t 恤 (黑色) XL",

where the color is always in brackets.颜色总是在括号中。

I then have a list of colors:然后我有一个 colors 的列表:

Colours = ('Blue',
 'Orange',
 'Green',
 'Red',
 'Purple',
 'Brown',
 'Pink',
 'Gray',
 'Olive',
 'Cyan',
 'Black',
 'White') 

My aim is to count the occurrences of each word in the list Colours alongside the word "t-shirt" in that df.Description column.我的目标是计算df.Description列中“t-shirt”一词旁边的 Colors 列表中每个单词的出现次数。

Basically I need to know which t-shirt colour has been sold the most so I need to count for the colors when there is also the word t-shirt in df.Description .基本上我需要知道哪种颜色的 T 恤卖得最多,所以当 df.Description 中还有 T 恤这个词时,我需要计算df.Description

You can use apply() and value_counts() to calculate the number of occurrance of colors.您可以使用apply()value_counts()来计算 colors 的出现次数。 I would also suggest you to do text preprocessing like converting to small letters and removing punctuations.:我还建议您进行文本预处理,例如转换为小写字母和删除标点符号。:

def find_col(x):
  Colours = ['Blue',
 'Orange',
 'Green',
 'Red',
 'Purple',
 'Brown',
 'Pink',
 'Gray',
 'Olive',
 'Cyan',
 'Black',
 'White']

  for col in Colours:
    if (col in x) and ('t-shirt' in x):
      return col + ' t-shirt'
  return 'Not a tshirt'


df['Colors'] = df['Description'].apply(find_col)

df['Colors'].value_counts()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM