计算文本列中列表中名称的出现次数

Question

I have a df.Description Column, which has text such as我有一个df.Description列，其中包含诸如

"shirt XML tag t-shirt (Black) XL", "衬衫 XML 标签 t 恤 (黑色) XL",

where the color is always in brackets.颜色总是在括号中。

I then have a list of colors:然后我有一个 colors 的列表：

Colours = ('Blue',
 'Orange',
 'Green',
 'Red',
 'Purple',
 'Brown',
 'Pink',
 'Gray',
 'Olive',
 'Cyan',
 'Black',
 'White')

My aim is to count the occurrences of each word in the list Colours alongside the word "t-shirt" in that df.Description column.我的目标是计算df.Description列中“t-shirt”一词旁边的 Colors 列表中每个单词的出现次数。

Basically I need to know which t-shirt colour has been sold the most so I need to count for the colors when there is also the word t-shirt in df.Description .基本上我需要知道哪种颜色的 T 恤卖得最多，所以当 df.Description 中还有 T 恤这个词时，我需要计算df.Description 。

Answer 1

You can use apply() and value_counts() to calculate the number of occurrance of colors.您可以使用apply()和value_counts()来计算 colors 的出现次数。 I would also suggest you to do text preprocessing like converting to small letters and removing punctuations.:我还建议您进行文本预处理，例如转换为小写字母和删除标点符号。：

def find_col(x):
  Colours = ['Blue',
 'Orange',
 'Green',
 'Red',
 'Purple',
 'Brown',
 'Pink',
 'Gray',
 'Olive',
 'Cyan',
 'Black',
 'White']

  for col in Colours:
    if (col in x) and ('t-shirt' in x):
      return col + ' t-shirt'
  return 'Not a tshirt'


df['Colors'] = df['Description'].apply(find_col)

df['Colors'].value_counts()

计算文本列中列表中名称的出现次数

问题描述

1 个解决方案

解决方案1
0 2021-11-29 13:16:10

计算文本列中列表中名称的出现次数

问题描述

1 个解决方案

解决方案1 0 2021-11-29 13:16:10

解决方案1
0 2021-11-29 13:16:10