简体   繁体   English

将两个 pandas 列的计数分组到 dataframe

[英]Grouping the counts of two pandas columns into a dataframe

I have the following data frame:我有以下数据框:

Sport运动 AgeGroup年龄阶层
Baseball棒球 20s 20 多岁
Football足球 20s 20 多岁
Baseball棒球 30s 30 多岁
Baseball棒球 20s 20 多岁
Football足球 20s 20 多岁
Football足球 20s 20 多岁
Football足球 30s 30 多岁

And the goal here is to get the counts between the two columns such that it gets the counts of each column into a list of lists in the following format: [['baseball', '20s', 2], ['baseball', '30s', 1], ['football', '20s', 3], ['football', '30s', 1]] where each list consists of the format [sport, ageGroup, count where both exist in the table] .这里的目标是获取两列之间的计数,以便将每列的计数获取到以下格式的列表列表中: [['baseball', '20s', 2], ['baseball', '30s', 1], ['football', '20s', 3], ['football', '30s', 1]]其中每个列表包含格式[sport, ageGroup, count 其中两者都存在于表中] .

Thus far, I've gotten what I needed in a way by doing: sport_age_count = df.groupby(["sport", "ageGroup"]).size() .到目前为止,我已经通过以下方式获得了我需要的东西: sport_age_count = df.groupby(["sport", "ageGroup"]).size() The problem here though is that the result isn't indexable and only treats the count like a singular list.但这里的问题是结果不可索引,并且仅将计数视为单数列表。 I get the following result with the simple code I have above:我使用上面的简单代码得到以下结果:

Sport运动 AgeGroup年龄阶层
Baseball棒球 20s 20 多岁 2 2
30s 30 多岁 1 1
Football足球 20s 20 多岁 3 3
30s 30 多岁 1 1

The problem is, when I try to use baseball, football, 20s, or 30s, it won't let me.问题是,当我尝试使用棒球、足球、20 多岁或 30 多岁时,它不会让我这样做。 I can only access the counts.我只能访问计数。 Also, besides the fact, I'd really like this to be formatted into the bold list of lists like above as well.此外,除此之外,我真的很希望将其格式化为上述列表的粗体列表 I wasn't sure if there's any alternative python tricks out there besides using extensive for loops to build this list myself.我不确定除了使用大量的 for 循环自己构建这个列表之外,是否还有其他的 python 技巧。

Another solution, using .agg() :另一种解决方案,使用.agg()

print(
    df.groupby(["Sport", "AgeGroup"], as_index=False)
    .size()
    .agg(list, axis=1)
    .tolist()
)

Prints:印刷:

[['Baseball', '20s', 2], ['Baseball', '30s', 1], ['Football', '20s', 3], ['Football', '30s', 1]]

If you want an array (makes for easier indexing) you can reset the index and use to_numpy() after your groupby如果您想要一个数组(使索引更容易),您可以在 groupby 之后重置索引并使用to_numpy()

sport_age_count.reset_index().to_numpy()

array([['Baseball', '20s', 2],
       ['Baseball', '30s', 1],
       ['Football', '20s', 3],
       ['Football', '30s', 1]], dtype=object)

But if you want a list of lists, you can use numpy's tolist()但是如果你想要一个列表列表,你可以使用 numpy 的tolist()

sport_age_count.reset_index().to_numpy().tolist()

[['Baseball', '20s', 2],
 ['Baseball', '30s', 1],
 ['Football', '20s', 3],
 ['Football', '30s', 1]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM