使用 pandas 聚合多列定性数据？

Question

I want to go from this:我想从这个 go ：

	name姓名	pet宠物
1 1	Rashida拉什达	dog狗
2 2	Rashida拉什达	cat猫
3 3	Jim吉姆	dog狗
4 4	JIm吉姆	dog狗

to this:对此：

	name姓名	num_dogs num_dogs	num_cats num_cats
1 1	Jim吉姆	2 2	0 0
2 2	Rashida拉什达	1 1	1 1

In R I would do在 R 我会做

df %>% 
  group_by(name) %>% 
  summarize(num_dogs = length(which(pet == "dog")),
            num_cats = length(which(pet == "cat")))

How would I do this using pandas?我将如何使用 pandas 做到这一点？

Answer 1

There are lots of different ways to do this.有很多不同的方法可以做到这一点。

If you are filtering the value of a single column, then you can use the.agg with a custom lambda function.如果要过滤单个列的值，则可以将.agg 与自定义 lambda function 一起使用。

(df.groupby(["name"])
  .agg(
      num_dogs=("pet", lambda x: np.sum(x == "dog")), 
      num_cats=("pet", lambda x: np.sum(x == "cat")))
)

Or或者

(df
  .groupby(["name", "pet"])
  .size()
  .unstack("pet", fill_value=0)
  .add_prefix("num_").add_suffix("s")
)

You can also use a pivot table.您还可以使用 pivot 表。

df.reset_index().pivot_table(index="name", columns="pet", values="index", aggfunc="count", fill_value=0)

But if you need to filter based on two columns, then that approach will not work.但是，如果您需要基于两列进行过滤，那么该方法将不起作用。 For example if you need to know how many old dogs.例如，如果您需要知道有多少只老狗。

df = pd.DataFrame({'name': ["Rashida", "Rashida", "Joe", "Joe"],
                   'pet': ['dog', 'cat', 'dog', 'dog'],
                   'age': ["old", "old", "old", "young"]})

You can use the pivot table.您可以使用 pivot 表。

df.reset_index().pivot_table(index="name", columns=["pet", "age"], values="index", aggfunc="count", fill_value=0)

Or a crosstabs.或交叉表。

pd.crosstab(df["name"], [df["pet"], df["age"]], dropna=False).unstack().reset_index()

Or you can use the port of Dplyr called siuba to mimic the original R syntax but I haven't used this enough to know how to use it well.或者，您可以使用名为 siuba 的 Dplyr 端口来模仿原始的 R 语法，但我还没有充分使用它，不知道如何很好地使用它。

from siuba import group_by, summarize, _

Answer 2

You can use datar , which is backended by pandas:您可以使用由datar支持的 datar ：

>>> from datar.all import f, tribble, length, group_by, which, summarise
>>> 
>>> df = tribble(
...     f.name,    f.pet,
...     "Rashida", "dog",
...     "Rashida", "cat",
...     "Jim",     "dog",
...     "Jim",     "dog",
... )
>>> 
>>> df >> group_by(f.name) >> summarise(
...     num_dogs = length(which(f.pet == "dog")),
...     num_cats = length(which(f.pet == "cat"))
... )
      name  num_dogs  num_cats
  <object>   <int64>   <int64>
0      Jim         2         0
1  Rashida         1         1

使用 pandas 聚合多列定性数据？

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-01-22 00:40:33

解决方案2
1 2021-06-15 05:42:04

使用 pandas 聚合多列定性数据？

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-01-22 00:40:33

解决方案2 1 2021-06-15 05:42:04

解决方案1
1 已采纳 2021-01-22 00:40:33

解决方案2
1 2021-06-15 05:42:04