[英]Grouping the counts of two pandas columns into a dataframe
I have the following data frame:我有以下数据框:
Sport运动 | AgeGroup年龄阶层 |
---|---|
Baseball棒球 | 20s 20 多岁 |
Football足球 | 20s 20 多岁 |
Baseball棒球 | 30s 30 多岁 |
Baseball棒球 | 20s 20 多岁 |
Football足球 | 20s 20 多岁 |
Football足球 | 20s 20 多岁 |
Football足球 | 30s 30 多岁 |
And the goal here is to get the counts between the two columns such that it gets the counts of each column into a list of lists in the following format: [['baseball', '20s', 2], ['baseball', '30s', 1], ['football', '20s', 3], ['football', '30s', 1]]
where each list consists of the format [sport, ageGroup, count where both exist in the table] .这里的目标是获取两列之间的计数,以便将每列的计数获取到以下格式的列表列表中: [['baseball', '20s', 2], ['baseball', '30s', 1], ['football', '20s', 3], ['football', '30s', 1]]
其中每个列表包含格式[sport, ageGroup, count 其中两者都存在于表中] .
Thus far, I've gotten what I needed in a way by doing: sport_age_count = df.groupby(["sport", "ageGroup"]).size()
.到目前为止,我已经通过以下方式获得了我需要的东西: sport_age_count = df.groupby(["sport", "ageGroup"]).size()
。 The problem here though is that the result isn't indexable and only treats the count like a singular list.但这里的问题是结果不可索引,并且仅将计数视为单数列表。 I get the following result with the simple code I have above:我使用上面的简单代码得到以下结果:
Sport运动 | AgeGroup年龄阶层 | |
---|---|---|
Baseball棒球 | 20s 20 多岁 | 2 2 |
30s 30 多岁 | 1 1 | |
Football足球 | 20s 20 多岁 | 3 3 |
30s 30 多岁 | 1 1 |
The problem is, when I try to use baseball, football, 20s, or 30s, it won't let me.问题是,当我尝试使用棒球、足球、20 多岁或 30 多岁时,它不会让我这样做。 I can only access the counts.我只能访问计数。 Also, besides the fact, I'd really like this to be formatted into the bold list of lists like above as well.此外,除此之外,我真的很希望将其格式化为上述列表的粗体列表。 I wasn't sure if there's any alternative python tricks out there besides using extensive for loops to build this list myself.我不确定除了使用大量的 for 循环自己构建这个列表之外,是否还有其他的 python 技巧。
Another solution, using .agg()
:另一种解决方案,使用.agg()
:
print(
df.groupby(["Sport", "AgeGroup"], as_index=False)
.size()
.agg(list, axis=1)
.tolist()
)
Prints:印刷:
[['Baseball', '20s', 2], ['Baseball', '30s', 1], ['Football', '20s', 3], ['Football', '30s', 1]]
If you want an array (makes for easier indexing) you can reset the index and use to_numpy()
after your groupby如果您想要一个数组(使索引更容易),您可以在 groupby 之后重置索引并使用to_numpy()
sport_age_count.reset_index().to_numpy()
array([['Baseball', '20s', 2],
['Baseball', '30s', 1],
['Football', '20s', 3],
['Football', '30s', 1]], dtype=object)
But if you want a list of lists, you can use numpy's tolist()
但是如果你想要一个列表列表,你可以使用 numpy 的tolist()
sport_age_count.reset_index().to_numpy().tolist()
[['Baseball', '20s', 2],
['Baseball', '30s', 1],
['Football', '20s', 3],
['Football', '30s', 1]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.