简体   繁体   English

如何通过对象访问pandas数据帧组内的数据?

[英]How do I access data inside a pandas dataframe groupby object?

Using the following code df_grouped was created. 使用以下代码创建了df_grouped。

df_grouped = df.groupby(by='Pclass')

Below a loop prints the Pclass value as well as the length of each grouped amount. 在循环下面打印Pclass值以及每个分组量的长度。

for val,grp in df_grouped:
    print('There were',len(grp),'people traveling in',val,'class.')

How does the code access the information? 代码如何访问信息? How can val & grp be used without being referenced earlier? 如何使用val和grp而不提前引用? How is this information stored inside the groupby object? 这些信息如何存储在groupby对象中?

As noted in the Group By: split-apply-combine documentation, the data are stored in a GroupBy object , which is a data structure with special attributes . Group By:split-apply-combine文档中所述,数据存储在GroupBy object ,该GroupBy object是具有特殊属性的数据结构。

You can verify this for yourself: 您可以自己验证:

>>> type(df_grouped)

Should return: 应该返回:

<class 'pandas.core.groupby.DataFrameGroupBy'>

The structure of the data is well explained by this snippet from the docs: 来自文档的这个片段很好地解释了数据的结构:

The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group. groups属性是一个dict,其键是计算的唯一组,对应的值是属于每个组的轴标签。

As you noticed, you can easily iterate through each individual group. 正如您所注意到的,您可以轻松地遍历每个组。 However, there are often vectorized methods that work very nicely with groupby objects, and can access information and calculate things much more effectively and quickly. 但是,通常使用矢量化方法可以很好地处理groupby对象,并且可以更有效,更快速地访问信息和计算事物。

Referencing the docs : "The groups attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group" 引用文档“groups属性是一个dict,其键是计算的唯一组,对应的值是属于每个组的轴标签”

You may be interested in looking into .agg() , for example: 您可能有兴趣查看.agg() ,例如:

df = pd.DataFrame([['Person A', 2, 3, 4],
                ['Person B', 3, 2, 1],
                ['Person C', 5, 7, 5],
                ['Person A', 3, 4, 9],
                ['Person C', 8, 3, 2]],
                columns=['Person','Val 1','Val 2','Val 3'])

Gives the following dataframe: 提供以下数据框:

     Person  Val 1  Val 2  Val 3
0  Person A      2      3      4
1  Person B      3      2      1
2  Person C      5      7      5
3  Person A      3      4      9
4  Person C      8      3      2

Then doing a groupyby and agg : 然后做一个groupybyagg

df.groupby('Person').agg({'Val 1': 'sum', 'Val 2': 'mean', 'Val 3': 'count'})

Gives: 得到:

          Val 1  Val 2  Val 3
Person                       
Person A      5    3.5      2
Person B      3    2.0      1
Person C     13    5.0      2

Here you can simply pass a dictionary to agg that specifies operations that you would like to perform on each group for a specific column. 在这里,您可以简单地将字典传递给agg ,该字段指定您要对特定列的每个组执行的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM