[英]equivalent python and pandas operation for group_by + mutate + indexing column vectors within mutate in R
Sample data frame in Python: Python 中的示例数据框:
d = {'col1': ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
'col2': [3, 4, 5, 1, 3, 9, 5, 7, 23]}
df = pd.DataFrame(data=d)
Now I want to get the same output in Python with pandas as I get in R with the code below.现在,我想使用 Pandas 在 Python 中获得与使用下面的代码在 R 中获得的输出相同的输出。 So I want to get the change in percentage in col1 by group in col2.所以我想在 col2 中按组获得 col1 中百分比的变化。
data.frame(col1 = c("a", "a", "a", "b", "b", "b", "c", "c", "c"),
col2 = c(3, 4, 5, 1, 3, 9, 16, 18, 23)) -> df
df %>%
dplyr::group_by(col1) %>%
dplyr::mutate(perc = (dplyr::last(col2) - col2[1]) / col2[1])
In python, I tried:在python中,我试过:
def perc_change(column):
index_1 = tu_in[column].iloc[0]
index_2 = tu_in[column].iloc[-1]
perc_change = (index_2 - index_1) / index_1
return(perc_change)
d = {'col1': ["a", "a", "a", "b", "b", "b", "c", "c", "c"],
'col2': [3, 4, 5, 1, 3, 9, 5, 7, 23]}
df = pd.DataFrame(data=d)
df.assign(perc_change = lambda x: x.groupby["col1"]["col2"].transform(perc_change))
But it gives me an error saying: 'method' object is not subscriptable.但它给了我一个错误说:“方法”对象不可下标。
I am new to python and trying to convert some R code into python.我是 python 的新手,并试图将一些 R 代码转换为 python。 How can I solve this in an elegant way?我怎样才能以优雅的方式解决这个问题? Thank you!谢谢!
You don't want transform
here.你不想在这里transform
。 transform
is typically used when your aggregation returns a scalar value per group and you want to broadcast that result to all rows that belong to that group in the original DataFrame.当您的聚合返回每个组的标量值并且您希望将该结果广播到原始 DataFrame 中属于该组的所有行时,通常会使用transform
。 Because GroupBy.pct_change
already returns a result indexed like the original, you aggregate and assign back.因为GroupBy.pct_change
已经返回一个像原始索引一样的结果,所以你聚合并分配回来。
df['perc_change'] = df.groupby('col1')['col2'].pct_change()
# col1 col2 perc_change
#0 a 3 NaN
#1 a 4 0.333333
#2 a 5 0.250000
#3 b 1 NaN
#4 b 3 2.000000
#5 b 9 2.000000
#6 c 5 NaN
#7 c 7 0.400000
#8 c 23 2.285714
But if instead what you need is the overall percentage change within a group, so it's the difference in the first and last value divided by the first value, you would then want transform.但是,如果您需要的是组内的总体百分比变化,即第一个和最后一个值的差异除以第一个值,那么您将需要转换。
df.groupby('col1')['col2'].transform(lambda x: (x.iloc[-1] - x.iloc[0])/x.iloc[0])
0 0.666667
1 0.666667
2 0.666667
3 8.000000
4 8.000000
5 8.000000
6 3.600000
7 3.600000
8 3.600000
Name: col2, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.