[英]Count occurrences of names from a list in a text columns
I have a df.Description
Column, which has text such as我有一个
df.Description
列,其中包含诸如
"shirt XML tag t-shirt (Black) XL",
"衬衫 XML 标签 t 恤 (黑色) XL",
where the color is always in brackets.颜色总是在括号中。
I then have a list of colors:然后我有一个 colors 的列表:
Colours = ('Blue',
'Orange',
'Green',
'Red',
'Purple',
'Brown',
'Pink',
'Gray',
'Olive',
'Cyan',
'Black',
'White')
My aim is to count the occurrences of each word in the list Colours alongside the word "t-shirt" in that df.Description
column.我的目标是计算
df.Description
列中“t-shirt”一词旁边的 Colors 列表中每个单词的出现次数。
Basically I need to know which t-shirt colour has been sold the most so I need to count for the colors when there is also the word t-shirt in df.Description
.基本上我需要知道哪种颜色的 T 恤卖得最多,所以当 df.Description 中还有 T 恤这个词时,我需要计算
df.Description
。
You can use apply()
and value_counts()
to calculate the number of occurrance of colors.您可以使用
apply()
和value_counts()
来计算 colors 的出现次数。 I would also suggest you to do text preprocessing like converting to small letters and removing punctuations.:我还建议您进行文本预处理,例如转换为小写字母和删除标点符号。:
def find_col(x):
Colours = ['Blue',
'Orange',
'Green',
'Red',
'Purple',
'Brown',
'Pink',
'Gray',
'Olive',
'Cyan',
'Black',
'White']
for col in Colours:
if (col in x) and ('t-shirt' in x):
return col + ' t-shirt'
return 'Not a tshirt'
df['Colors'] = df['Description'].apply(find_col)
df['Colors'].value_counts()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.