[英]What is the use of reset_index() in pandas?
While reading this article , I came across this statement.在阅读这篇文章时,我遇到了这个声明。
order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()
Other than reset_index()
method call, everything else is clear to me.除了
reset_index()
方法调用之外,其他一切对我来说都很清楚。 My question is what will happen if I don't call reset_index()
considering the given below sequence?我的问题是,如果我不调用
reset_index()
考虑下面给出的顺序会发生什么?
order_total = df.groupby('order')["ext price"].sum().rename("Order_Total").reset_index()
df_1 = df.merge(order_total)
df_1["Percent_of_Order"] = df_1["ext price"] / df_1["Order_Total"]
I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of a dataframe. I tried to understand about this method from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html , but couldn't understand what does it mean to reset the index of一个 dataframe。
A simplified explanation is that;一个简化的解释是;
reset_index()
takes the current index, and places it in column 'index'. reset_index()
获取当前索引,并将其放在“索引”列中。 Then it recreates a new 'linear' index for the data-set.然后它为数据集重新创建一个新的“线性”索引。
df=pd.DataFrame([20,30,40,50],index=[2,3,4,5])
0
2 20
3 30
4 40
5 50
df.reset_index()
index 0
0 2 20
1 3 30
2 4 40
3 5 50
Reset Index will create index starting from 0 and remove if there is any column set as index. Reset Index 将从 0 开始创建索引,如果有任何列设置为索引,则将其删除。
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Hello Kitty",
"Hello Puppy",
"It is an Helloexample",
"for stackoverflow",
"Hello World",
],
}
)
newdf = df.set_index('ID')
print newdf.reset_index()
Output Before reset_index(): Output 在 reset_index() 之前:
name
ID
1 Hello Kitty
2 Hello Puppy
3 It is an Helloexample
4 for stackoverflow
5 Hello World
Output after reset_index(): reset_index() 之后的 Output:
ID name
0 1 Hello Kitty
1 2 Hello Puppy
2 3 It is an Helloexample
3 4 for stackoverflow
4 5 Hello World
I think better here is use GroupBy.transform
for new Series
with same size like original DataFrame filled by aggregate values, so merge
is not necessary:我认为这里更好的是使用
GroupBy.transform
与原始 DataFrame 相同大小的新Series
由聚合值填充,因此不需要merge
:
df_1 = pd.DataFrame({
'A':list('abcdef'),
'ext price':[5,3,6,9,2,4],
'order':list('aaabbb')
})
order_total1 = df_1.groupby('order')["ext price"].transform('sum')
df_1["Percent_of_Order"] = df_1["ext price"] / order_total1
print (df_1)
A ext price order Percent_of_Order
0 a 5 a 0.357143
1 b 3 a 0.214286
2 c 6 a 0.428571
3 d 9 b 0.600000
4 e 2 b 0.133333
5 f 4 b 0.266667
My question is what will happen if I don't call reset_index() considering the sequence?
我的问题是,如果我不考虑顺序调用 reset_index() 会发生什么?
Here is Series
before reset_index()
, so after reset_index
is converting Series
to 2 columns DataFrame, first column is called by index name and second column by Series
name.这是
reset_index()
之前的Series
,因此在reset_index
将Series
转换为 2 列 DataFrame 之后,第一列由索引名称调用,第二列由Series
名称调用。
order_total = df_1.groupby('order')["ext price"].sum().rename("Order_Total")
print (order_total)
order
a 14
b 15
Name: Order_Total, dtype: int64
print (type(order_total))
<class 'pandas.core.series.Series'>
print (order_total.name)
Order_Total
print (order_total.index.name)
order
print (order_total.reset_index())
order Order_Total
0 a 14
1 b 15
Reason why is necessry in your code to 2 columns DataFrame is no parameter in merge
.您的代码中需要 2 列 DataFrame 的原因是
merge
中没有参数。 It means it use parameter on
by intersection of common columns names between both DataFrames, here order
column.这意味着它通过两个 DataFrame 之间的公共列名称
on
交集使用参数,这里是order
列。
To answer your question:要回答您的问题:
My question is what will happen if I don't call reset_index() considering the sequence?
我的问题是,如果我不考虑顺序调用 reset_index() 会发生什么?
You will have a multi-index formed by the keys you have applied group-by statement on.您将拥有一个由您应用 group-by 语句的键形成的多索引。 for eg- 'order' in your case.
例如-在您的情况下为“订单”。 Specific to the article, difference in indices of two dataframes may cause wrong merges (done after the group-by statement).
具体到文章,两个数据帧的索引差异可能会导致错误的合并(在 group-by 语句之后完成)。
Hence, a reset-index is needed to perform the correct merge.因此,需要一个重置索引来执行正确的合并。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.