pandas groupby 和 reset_index 如何更改数据帧的索引？

Question

Can someone explain what happens during a reset_index(name='counts') operation after a groupby(...).size() operation on a dataframe?有人可以解释在对数据帧执行 groupby(...).size() 操作之后的reset_index(name='counts')操作期间会发生什么吗？ It does exactly what I want (creates a dataframe with a column 'counts' that has the size of each group), but I don't understand why it works.它完全符合我的要求（创建一个包含每个组大小的“计数”列的数据框），但我不明白它为什么起作用。

df = pd.DataFrame( {'letter':['A', 'A', 'B', 'B', 'C'], 'number':[0,0,1,2,0]} )

If I do a groupby + size operation: df.groupby(['letter', 'number']).size() , I get a multi-level index with one 'letter' level and one 'number' level:如果我执行 groupby + size 操作： df.groupby(['letter', 'number']).size() ，我会得到一个具有一个“字母”级别和一个“数字”级别的多级索引：

df = df.groupby(['letter', 'number']).size()
print df.index

Out: MultiIndex(levels=[[u'A', u'B', u'C'], [0, 1, 2]], labels=[[0, 1, 1, 2], [0, 1, 2, 0]], names=[u'letter', u'number'])

I'm confused about what happens when I add .reset_index(...) operation:我对添加.reset_index(...)操作时会发生什么感到困惑：

df = df.groupby(['letter', 'number']).size().reset_index(name='counts') , df = df.groupby(['letter', 'number']).size().reset_index(name='counts') ,

which produces the following Dataframe with index = RangeIndex(start=0, stop=4, step=1) :它产生以下数据帧，索引 = RangeIndex(start=0, stop=4, step=1) ：

  letter  number  counts
0      A       0       2
1      B       1       1
2      B       2       1
3      C       0       1

I'm particularly confused about three points:我对三点特别困惑：

The documentation for reset_index doesn't have a keyword argument called 'name', but I've seen a number of posts that recommend using it to created a named size/sum column [1 , 2 , 3] and it appears to work. reset_index的文档没有名为“name”的关键字参数，但我看到许多帖子建议使用它来创建命名的大小/总和列[1 , 2 , 3]并且它似乎有效。 Is there some documentation that explains how this name keyword argument works?是否有一些文档解释了此name关键字参数的工作原理？
The new dataframe after the reset_index has a column named 'counts', but the reset_index documentation doesn't say anything about causing a column to be named, so how does this happen? reset_index之后的新数据reset_index有一个名为 'counts' 的列，但是reset_index文档没有说明导致列被命名的任何内容，那么这是怎么发生的呢？
Why does the whole multilevel index get reset if we only specified a specific index level ('counts') to be removed?如果我们只指定要删除的特定索引级别（“计数”），为什么整个多级索引会被重置？

Answer 1

Text in your question is a bit confusing.您问题中的文字有点令人困惑。 When you use groupby you need to provide an argument for the grouping.当您使用groupby您需要为分组提供一个参数。 You may want to edit.您可能想要编辑。 I think I can still answer your Q...我想我仍然可以回答你的问题...

If you groupby 1 thing, you will typically get a series as an answer to .size() or .count() .如果你分组 1 个东西，你通常会得到一个series作为.size()或.count()的答案。 You can use the .index to check out what is going on:您可以使用.index来查看发生了什么：

In [18]: df1 = pd.DataFrame({'letter':['A', 'A', 'B', 'B', 'C'], 'number':[0,0,1
    ...: ,2,0]})                                                                

In [19]: df1                                                                    
Out[19]: 
  letter  number
0      A       0
1      A       0
2      B       1
3      B       2
4      C       0

In [20]: df1.index                                                              
Out[20]: RangeIndex(start=0, stop=5, step=1)

In [21]: df1.groupby('letter').size()                                           
Out[21]: 
letter
A    2
B    2
C    1
dtype: int64

In [22]: size_groups = _                                                        

In [23]: size_groups.index                                                      
Out[23]: Index(['A', 'B', 'C'], dtype='object', name='letter')

In [24]: type(size_groups)                                                      
Out[24]: pandas.core.series.Series

So, this is a series, with the index as the list shown above.所以，这是一个系列，索引如上所示。 If you reset this index, pandas will retain that series, but add a new index series, and move the sizes over to a new series, which will create a dataframe of the 2 series:如果您重置此索引，pandas 将保留该系列，但会添加一个新的索引系列，并将大小移动到一个新系列，这将创建 2 个系列的数据框：

In [25]: size_groups.reset_index()                                              
Out[25]: 
  letter  0
0      A  2
1      B  2
2      C  1

You won't get a multilevel index out of this unless you groupby 2 things.除非您groupby 2 件事，否则您不会从中获得多级索引。 For instance:例如：

In [43]: df1                                                                    
Out[43]: 
  letter  number
0      A       0
1      A       0
2      B       1
3      B       2
4      C       0

In [44]: df2 = df1.groupby(['letter', 'number']).size()                         

In [45]: df2                                                                    
Out[45]: 
letter  number
A       0         2
B       1         1
        2         1
C       0         1
dtype: int64

In [46]: df2.index                                                              
Out[46]: 
MultiIndex([('A', 0),
            ('B', 1),
            ('B', 2),
            ('C', 0)],
           names=['letter', 'number'])

pandas groupby 和 reset_index 如何更改数据帧的索引？

问题描述

1 个解决方案

解决方案1
1 2020-01-11 23:49:59

pandas groupby 和 reset_index 如何更改数据帧的索引？

问题描述

1 个解决方案

解决方案1 1 2020-01-11 23:49:59

解决方案1
1 2020-01-11 23:49:59