简体   繁体   English

pandas 中的 reset_index() 有什么用?

[英]What is the use of reset_index() in pandas?

While reading this article , I came across this statement.在阅读这篇文章时,我遇到了这个声明。

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()

Other than reset_index() method call, everything else is clear to me.除了reset_index()方法调用之外,其他一切对我来说都很清楚。 My question is what will happen if I don't call reset_index() considering the given below sequence?我的问题是,如果我不调用reset_index()考虑下面给出的顺序会发生什么?

order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()
df_1 = df.merge(order_total)
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]

I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of a dataframe. I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of一个 dataframe。

A simplified explanation is that;一个简化的解释是; reset_index() takes the current index, and places it in column 'index'. reset_index()获取当前索引,并将其放在“索引”列中。 Then it recreates a new 'linear' index for the data-set.然后它为数据集重新创建一个新的“线性”索引。

df=pd.DataFrame([20,30,40,50],index=[2,3,4,5])

    0
2  20
3  30
4  40
5  50

df.reset_index()

   index   0
0      2  20
1      3  30
2      4  40
3      5  50

Reset Index will create index starting from 0 and remove if there is any column set as index. Reset Index 将从 0 开始创建索引,如果有任何列设置为索引,则将其删除。

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Hello Kitty",
            "Hello Puppy",
            "It is an Helloexample",
            "for stackoverflow",
            "Hello World",
        ],
    }
)
newdf = df.set_index('ID')

print newdf.reset_index()

Output Before reset_index(): Output 在 reset_index() 之前:

                     name
ID                       
1             Hello Kitty
2             Hello Puppy
3   It is an Helloexample
4       for stackoverflow
5             Hello World

Output after reset_index(): reset_index() 之后的 Output:

   ID                   name
0   1            Hello Kitty
1   2            Hello Puppy
2   3  It is an Helloexample
3   4      for stackoverflow
4   5            Hello World

I think better here is use GroupBy.transform for new Series with same size like original DataFrame filled by aggregate values, so merge is not necessary:我认为这里更好的是使用GroupBy.transform与原始 DataFrame 相同大小的新Series由聚合值填充,因此不需要merge

df_1 = pd.DataFrame({
         'A':list('abcdef'),
         'ext price':[5,3,6,9,2,4],
         'order':list('aaabbb')
})


order_total1 = df_1.groupby('order')["ext price"].transform('sum')
df_1["Percent_of_Order"] = df_1["ext price"] / order_total1
print (df_1)
   A  ext price order  Percent_of_Order
0  a          5     a          0.357143
1  b          3     a          0.214286
2  c          6     a          0.428571
3  d          9     b          0.600000
4  e          2     b          0.133333
5  f          4     b          0.266667

My question is what will happen if I don't call reset_index() considering the sequence?我的问题是,如果我不考虑顺序调用 reset_index() 会发生什么?

Here is Series before reset_index() , so after reset_index is converting Series to 2 columns DataFrame, first column is called by index name and second column by Series name.这是reset_index()之前的Series ,因此在reset_indexSeries转换为 2 列 DataFrame 之后,第一列由索引名称调用,第二列由Series名称调用。

order_total = df_1.groupby('order')["ext price"].sum().rename("Order_Total")
print (order_total)
order
a    14
b    15
Name: Order_Total, dtype: int64

print (type(order_total))
<class 'pandas.core.series.Series'>

print (order_total.name)
Order_Total

print (order_total.index.name)
order

print (order_total.reset_index())
  order  Order_Total
0     a           14
1     b           15

Reason why is necessry in your code to 2 columns DataFrame is no parameter in merge .您的代码中需要 2 列 DataFrame 的原因是merge中没有参数。 It means it use parameter on by intersection of common columns names between both DataFrames, here order column.这意味着它通过两个 DataFrame 之间的公共列名称on交集使用参数,这里是order列。

To answer your question:要回答您的问题:

My question is what will happen if I don't call reset_index() considering the sequence?我的问题是,如果我不考虑顺序调用 reset_index() 会发生什么?

You will have a multi-index formed by the keys you have applied group-by statement on.您将拥有一个由您应用 group-by 语句的键形成的多索引。 for eg- 'order' in your case.例如-在您的情况下为“订单”。 Specific to the article, difference in indices of two dataframes may cause wrong merges (done after the group-by statement).具体到文章,两个数据帧的索引差异可能会导致错误的合并(在 group-by 语句之后完成)。

Hence, a reset-index is needed to perform the correct merge.因此,需要一个重置索引来执行正确的合并。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas 1.1.5 和 1.3.4 之间的哪些变化改变了 set_index / reset_index 过程? - What change between pandas 1.1.5 and 1.3.4 changed the set_index / reset_index process? pandas groupby 中“as_index = False”和“reset_index()”的区别 - Difference between "as_index = False", and "reset_index()" in pandas groupby 如何在 Pandas Python 中将 reset_index 与多组值(分层格式)一起使用 - how can I use reset_index with the multi grouped values(Hierarchical format) in Pandas Python Pandas reset_index() - 将默认值更改为删除索引 - Pandas reset_index() - change default to drop index Pandas数据帧问题:`reset_index`不会删除分层索引 - Pandas dataframe issue: `reset_index` does not remove hierarchical index pandas set_index 和 reset_index 改变变量的类型 - pandas set_index and reset_index change the type of variable 熊猫:pd.PeriodIndex之后的reset_index - Pandas: reset_index after pd.PeriodIndex 带有 reset_index 的 Pandas 链方法 drop/dropna - Pandas chain method drop/dropna with reset_index pandas groupby()之后reset_index()到原始列索引? - reset_index() to original column indices after pandas groupby()? 通过多种方法进行分组和汇总后,Pandas reset_index()无法正常工作 - Pandas reset_index() is not working after grouping by and aggregating by multiple methods
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM