简体   繁体   English

pandas groupby 中“as_index = False”和“reset_index()”的区别

[英]Difference between "as_index = False", and "reset_index()" in pandas groupby

I just wanted to know what is the difference in the function performed by these 2.我只是想知道这两个执行的功能有什么区别。

Data:数据:

import pandas as pd
df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]})

as_index=False : as_index=False :

df_group1 = df.groupby("ID").sum().reset_index()

reset_index() :重置索引():

df_group2 = df.groupby("ID", as_index=False).sum()

Both of them give the exact same output.它们都给出了完全相同的输出。

  ID  value
0  A     18
1  B      6
2  C      6

Can anyone tell me what is the difference and any example illustrating the same?谁能告诉我有什么区别和任何说明相同的例子?

When you use as_index=False , you indicate to groupby() that you don't want to set the column ID as the index (duh!).当您使用as_index=False ,您向groupby()表明您不想将列 ID 设置为索引(废话!)。 When both implementation yield the same results, use as_index=False because it will save you some typing and an unnecessary pandas operation ;)当两个实现产生相同的结果时,使用as_index=False因为它会为您节省一些输入和不必要的熊猫操作;)

However, sometimes, you want to apply more complicated operations on your groups.但是,有时,您希望对组应用更复杂的操作。 In those occasions, you might find out that one is more suited than the other.在这些情况下,您可能会发现一个比另一个更适合。

Example 1: You want to sum the values of three variables (ie columns) in a group on both axes.示例 1:您想对一组中两个轴上的三个变量(即列)的值求和。

Using as_index=True allows you to apply a sum over axis=1 without specifying the names of the columns, then summing the value over axis 0. When the operation is finished, you can use reset_index(drop=True/False) to get the dataframe under the right form.使用as_index=True允许您在不指定列名称的情况下对axis=1应用求和,然后对轴 0 上的值求和。 操作完成后,您可以使用reset_index(drop=True/False)获得正确形式下的数据框。

Example 2: You need to set a value for the group based on the columns in the groupby() .示例 2:您需要根据groupby()中的groupby()组设置一个值。

Setting as_index=False allow you to check the condition on a common column and not on an index, which is often way easier.设置as_index=False允许您检查公共列而不是索引上的条件,这通常更容易。

At some point, you might come across KeyError when applying operations on groups.在某些时候,您可能会在对组应用操作时遇到KeyError In that case, it is often because you are trying to use a column in your aggregate function that is currently an index of your GroupBy object.在这种情况下,通常是因为您试图在聚合函数中使用当前是 GroupBy 对象索引的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM