pandas 中的 reset_index() 有什么用？

Question

While reading this article , I came across this statement.在阅读这篇文章时，我遇到了这个声明。

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()

Other than reset_index() method call, everything else is clear to me.除了reset_index()方法调用之外，其他一切对我来说都很清楚。 My question is what will happen if I don't call reset_index() considering the given below sequence?我的问题是，如果我不调用reset_index()考虑下面给出的顺序会发生什么？

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()
df_1 = df.merge(order_total)
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]

I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of a dataframe. I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of一个 dataframe。

Answer 1

A simplified explanation is that;一个简化的解释是； reset_index() takes the current index, and places it in column 'index'. reset_index()获取当前索引，并将其放在“索引”列中。 Then it recreates a new 'linear' index for the data-set.然后它为数据集重新创建一个新的“线性”索引。

df=pd.DataFrame([20,30,40,50],index=[2,3,4,5])

    0
2  20
3  30
4  40
5  50

df.reset_index()

   index   0
0      2  20
1      3  30
2      4  40
3      5  50

Answer 2

Reset Index will create index starting from 0 and remove if there is any column set as index. Reset Index 将从 0 开始创建索引，如果有任何列设置为索引，则将其删除。

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Hello Kitty",
            "Hello Puppy",
            "It is an Helloexample",
            "for stackoverflow",
            "Hello World",
        ],
    }
)
newdf = df.set_index('ID')

print newdf.reset_index()

Output Before reset_index(): Output 在 reset_index() 之前：

                     name
ID                       
1             Hello Kitty
2             Hello Puppy
3   It is an Helloexample
4       for stackoverflow
5             Hello World

Output after reset_index(): reset_index() 之后的 Output：

   ID                   name
0   1            Hello Kitty
1   2            Hello Puppy
2   3  It is an Helloexample
3   4      for stackoverflow
4   5            Hello World

Answer 3

I think better here is use GroupBy.transform for new Series with same size like original DataFrame filled by aggregate values, so merge is not necessary:我认为这里更好的是使用GroupBy.transform与原始 DataFrame 相同大小的新Series由聚合值填充，因此不需要merge ：

df_1 = pd.DataFrame({
         'A':list('abcdef'),
         'ext price':[5,3,6,9,2,4],
         'order':list('aaabbb')
})


order_total1 = df_1.groupby('order')["ext price"].transform('sum')
df_1["Percent_of_Order"] = df_1["ext price"] / order_total1
print (df_1)
   A  ext price order  Percent_of_Order
0  a          5     a          0.357143
1  b          3     a          0.214286
2  c          6     a          0.428571
3  d          9     b          0.600000
4  e          2     b          0.133333
5  f          4     b          0.266667

My question is what will happen if I don't call reset_index() considering the sequence?我的问题是，如果我不考虑顺序调用 reset_index() 会发生什么？

Here is Series before reset_index() , so after reset_index is converting Series to 2 columns DataFrame, first column is called by index name and second column by Series name.这是reset_index()之前的Series ，因此在reset_index将Series转换为 2 列 DataFrame 之后，第一列由索引名称调用，第二列由Series名称调用。

order_total = df_1.groupby('order')["ext price"].sum().rename("Order_Total")
print (order_total)
order
a    14
b    15
Name: Order_Total, dtype: int64

print (type(order_total))
<class 'pandas.core.series.Series'>

print (order_total.name)
Order_Total

print (order_total.index.name)
order

print (order_total.reset_index())
  order  Order_Total
0     a           14
1     b           15

Reason why is necessry in your code to 2 columns DataFrame is no parameter in merge .您的代码中需要 2 列 DataFrame 的原因是merge中没有参数。 It means it use parameter on by intersection of common columns names between both DataFrames, here order column.这意味着它通过两个 DataFrame 之间的公共列名称on交集使用参数，这里是order列。

Answer 4

To answer your question:要回答您的问题：

My question is what will happen if I don't call reset_index() considering the sequence?我的问题是，如果我不考虑顺序调用 reset_index() 会发生什么？

You will have a multi-index formed by the keys you have applied group-by statement on.您将拥有一个由您应用 group-by 语句的键形成的多索引。 for eg- 'order' in your case.例如-在您的情况下为“订单”。 Specific to the article, difference in indices of two dataframes may cause wrong merges (done after the group-by statement).具体到文章，两个数据帧的索引差异可能会导致错误的合并（在 group-by 语句之后完成）。

Hence, a reset-index is needed to perform the correct merge.因此，需要一个重置索引来执行正确的合并。

pandas 中的 reset_index() 有什么用？

问题描述

4 个解决方案

解决方案1
2 2019-11-18 07:23:02

解决方案2
2 2019-11-18 07:29:24

解决方案3
1 已采纳 2019-11-18 07:23:51

解决方案4
0 2019-11-18 08:27:47

pandas 中的 reset_index() 有什么用？

问题描述

4 个解决方案

解决方案1 2 2019-11-18 07:23:02

解决方案2 2 2019-11-18 07:29:24

解决方案3 1 已采纳 2019-11-18 07:23:51

解决方案4 0 2019-11-18 08:27:47

解决方案1
2 2019-11-18 07:23:02

解决方案2
2 2019-11-18 07:29:24

解决方案3
1 已采纳 2019-11-18 07:23:51

解决方案4
0 2019-11-18 08:27:47