[英]apply function to all columns in pandas groupby dataframe
I have the following dataframe (let's call it data
):我有以下 dataframe (我们称之为
data
):
id | type | val1 | val2 |
-------------------------
1 | A | 10.1 | 11.0 |
1 | B | 10.5 | 11.2 |
2 | A | 10.7 | 10.9 |
2 | B | 10.6 | 11.1 |
3 | A | 10.3 | 10.5 |
3 | B | 10.4 | 11.3 |
and I want to obtain the difference between the types A
and B
( A
- B
) for each id
for each valX
column, ie I want the result to be:我想获得每个
valX
列的每个id
的类型A
和B
( A
- B
) 之间的差异,即我希望结果是:
id | val1 | val2 |
------------------
1 | -0.4 | -0.2 |
2 | 0.1 | -0.2 |
3 | -0.1 | -0.8 |
The only way I could get this done was to define a function:我能做到这一点的唯一方法是定义一个 function:
def getDelta(df, valName):
return df[ df['type']=='A' ][valName].values[0] - df[ df['type']=='B' ][valName].values[0]
and apply it for each column separately:并将其分别应用于每一列:
data.groupby('id').apply(getDelta,valName='val1')
and then merge the results to obtain what I was looking for.然后合并结果以获得我正在寻找的内容。
Is there a more efficient way to do it?有没有更有效的方法来做到这一点? In the end I want to apply a function to a subset of columns of the grouped dataframe, but this function has to take into account the values of another column.
最后,我想将 function 应用于分组 dataframe 的列的子集,但是这个 function 必须考虑到另一列的值。
As it currently is you can use np.subtract.reduce
, assuming 'A' comes before 'B' in all cases, and there are no duplicates:就目前而言,您可以使用
np.subtract.reduce
,假设在所有情况下 'A' 都在 'B' 之前,并且没有重复:
df.groupby("id", sort = False).agg(np.subtract.reduce).reset_index()
id val1 val2
0 1 -0.4 -0.2
1 2 0.1 -0.2
2 3 -0.1 -0.8
You can pivot the dataframe:您可以 pivot dataframe:
x = df.pivot(index="id", columns="type", values="val1")
y = df.pivot(index="id", columns="type", values="val2")
df = pd.concat([x["A"] - x["B"], y["A"] - y["B"]], axis=1).rename(
columns={0: "val1", 1: "val2"}
)
print(df)
Prints:印刷:
val1 val2
id
1 -0.4 -0.2
2 0.1 -0.2
3 -0.1 -0.8
You can groupby()
your ID column and use diff(-1)
on your valX
columns.您可以
groupby()
您的 ID 列并在您的valX
列上使用diff(-1)
。 Wrapping the operation in concat()
, will you give you your desired outcome.将操作包装在
concat()
中,您会给您想要的结果。
df.set_index('id',inplace=True)
pd.concat([df.groupby(['id'])[df.filter(like='val').columns.tolist()].diff(-1).dropna()]).reset_index()
id val1 val2 val3
0 1 -0.4 -0.2 -3.1
1 2 0.1 -0.2 17.0
2 3 -0.1 -0.8 1.5
I have added an extra valX just for illustration purposes.出于说明目的,我添加了一个额外的 valX。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.