[英]Difference between "as_index = False", and "reset_index()" in pandas groupby
I just wanted to know what is the difference in the function performed by these 2.我只是想知道这两个执行的功能有什么区别。
Data:数据:
import pandas as pd
df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]})
as_index=False : as_index=False :
df_group1 = df.groupby("ID").sum().reset_index()
reset_index() :重置索引():
df_group2 = df.groupby("ID", as_index=False).sum()
Both of them give the exact same output.它们都给出了完全相同的输出。
ID value
0 A 18
1 B 6
2 C 6
Can anyone tell me what is the difference and any example illustrating the same?谁能告诉我有什么区别和任何说明相同的例子?
When you use as_index=False
, you indicate to groupby()
that you don't want to set the column ID as the index (duh!).当您使用
as_index=False
,您向groupby()
表明您不想将列 ID 设置为索引(废话!)。 When both implementation yield the same results, use as_index=False
because it will save you some typing and an unnecessary pandas operation ;)当两个实现产生相同的结果时,使用
as_index=False
因为它会为您节省一些输入和不必要的熊猫操作;)
However, sometimes, you want to apply more complicated operations on your groups.但是,有时,您希望对组应用更复杂的操作。 In those occasions, you might find out that one is more suited than the other.
在这些情况下,您可能会发现一个比另一个更适合。
Example 1: You want to sum the values of three variables (ie columns) in a group on both axes.示例 1:您想对一组中两个轴上的三个变量(即列)的值求和。
Using as_index=True
allows you to apply a sum over axis=1
without specifying the names of the columns, then summing the value over axis 0. When the operation is finished, you can use reset_index(drop=True/False)
to get the dataframe under the right form.使用
as_index=True
允许您在不指定列名称的情况下对axis=1
应用求和,然后对轴 0 上的值求和。 操作完成后,您可以使用reset_index(drop=True/False)
获得正确形式下的数据框。
Example 2: You need to set a value for the group based on the columns in the groupby()
.示例 2:您需要根据
groupby()
中的groupby()
组设置一个值。
Setting as_index=False
allow you to check the condition on a common column and not on an index, which is often way easier.设置
as_index=False
允许您检查公共列而不是索引上的条件,这通常更容易。
At some point, you might come across KeyError
when applying operations on groups.在某些时候,您可能会在对组应用操作时遇到
KeyError
。 In that case, it is often because you are trying to use a column in your aggregate function that is currently an index of your GroupBy object.在这种情况下,通常是因为您试图在聚合函数中使用当前是 GroupBy 对象索引的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.