简体   繁体   English

是否有 Python function 用于计算单元格中的字符串数量并在新的 dataframe 中报告这些?

[英]Is there a Python function for counting the number of strings in a cell and reporting these in a new dataframe?

Let's stay I have a grocery list with one column titled "Groceries".让我们留下我有一个杂货清单,其中有一列标题为“杂货”。 In each row there is a list of strings, for example.例如,在每一行中都有一个字符串列表。

Groceries杂货
apples, bananas, oranges苹果、香蕉、橙子
apples, bananas, bananas, pears苹果、香蕉、香蕉、梨
oranges, pears, bananas橘子、梨、香蕉

Is there a way to count each string and add a "tally" is a new dataframe or similar thing with the appropriately labeled item?有没有办法计算每个字符串并添加一个“计数”是一个新的 dataframe 或带有适当标签项目的类似东西? The dataframe would then look like: dataframe 将如下所示:

apples苹果 oranges橘子 bananas香蕉 pears
1 1 1 1 1 1 0 0
1 1 0 0 2 2 1 1
1 1 1 1 0 0 1 1

I can't find a function that will recognize strings and count them in the appropriate row/column with the string name.我找不到 function 可以识别字符串并使用字符串名称在适当的行/列中对它们进行计数。 I am also pretty new to Python and am not sure what would go into creating a function that would do this.我对 Python 也很陌生,我不确定 go 会怎样创建一个 function 来做到这一点。

You can split the string on commas, explode to multiple rows, get_dummies to transform to 0/1, and groupby.sum to aggregate:您可以用逗号split字符串, get_dummies explode为 0/1, groupby.sum聚合:

out = (pd
 .get_dummies(df['Groceries'].str.split(',\s*').explode())
 .groupby(level=0).sum()
)

Or similar with crosstab :或与crosstab类似:

s = df['Groceries'].str.split(',\s*').explode()
out = pd.crosstab(s.index, s)

output: output:

   apples  bananas  oranges  pears
0       1        1        1      0
1       1        2        0      1
2       0        1        1      1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM