是否有 Python function 用于计算单元格中的字符串数量并在新的 dataframe 中报告这些？

Question

Let's stay I have a grocery list with one column titled "Groceries".让我们留下我有一个杂货清单，其中有一列标题为“杂货”。 In each row there is a list of strings, for example.例如，在每一行中都有一个字符串列表。

Groceries杂货
apples, bananas, oranges苹果、香蕉、橙子
apples, bananas, bananas, pears苹果、香蕉、香蕉、梨
oranges, pears, bananas橘子、梨、香蕉

Is there a way to count each string and add a "tally" is a new dataframe or similar thing with the appropriately labeled item?有没有办法计算每个字符串并添加一个“计数”是一个新的 dataframe 或带有适当标签项目的类似东西？ The dataframe would then look like: dataframe 将如下所示：

apples苹果	oranges橘子	bananas香蕉	pears梨
1 1	1 1	1 1	0 0
1 1	0 0	2 2	1 1
1 1	1 1	0 0	1 1

I can't find a function that will recognize strings and count them in the appropriate row/column with the string name.我找不到 function 可以识别字符串并使用字符串名称在适当的行/列中对它们进行计数。 I am also pretty new to Python and am not sure what would go into creating a function that would do this.我对 Python 也很陌生，我不确定 go 会怎样创建一个 function 来做到这一点。

Answer 1

You can split the string on commas, explode to multiple rows, get_dummies to transform to 0/1, and groupby.sum to aggregate:您可以用逗号split字符串， get_dummies explode为 0/1， groupby.sum聚合：

out = (pd
 .get_dummies(df['Groceries'].str.split(',\s*').explode())
 .groupby(level=0).sum()
)

Or similar with crosstab :或与crosstab类似：

s = df['Groceries'].str.split(',\s*').explode()
out = pd.crosstab(s.index, s)

output: output：

   apples  bananas  oranges  pears
0       1        1        1      0
1       1        2        0      1
2       0        1        1      1

是否有 Python function 用于计算单元格中的字符串数量并在新的 dataframe 中报告这些？

问题描述

1 个解决方案

解决方案1
2 2022-09-20 16:50:52

是否有 Python function 用于计算单元格中的字符串数量并在新的 dataframe 中报告这些？

问题描述

1 个解决方案

解决方案1 2 2022-09-20 16:50:52

解决方案1
2 2022-09-20 16:50:52