I have a dataframe
and would like to subtract two columns of the previous row, provided that the previous row has the same Name
value. If it does not, then I would like it yield NAN
and fill with -
. My groupby
expression yields the error, TypeError: 'Series' objects are mutable, thus they cannot be hashed
, which is very ambiguous. What am I missing?
import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 8], ['Person A', 13, 11], ['Person B', 11, 32], ['Person B', 15, 20]], columns=['Names', 'Value', 'Value1'])
df['diff'] = df.groupby('Names').apply(df['Value'].shift(1) - df['Value1'].shift(1)).fillna('-')
print df
Desired Output:
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
You can add lambda x
and change df['Value']
to x['Value']
, similar with Value1
and last reset_index
:
df['diff'] = df.groupby('Names')
.apply(lambda x: x['Value'].shift(1) - x['Value1'].shift(1))
.fillna('-')
.reset_index(drop=True)
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
Another solution with DataFrameGroupBy.shift
:
df1 = df.groupby('Names')['Value','Value1'].shift()
print (df1)
Value Value1
0 NaN NaN
1 5.0 8.0
2 NaN NaN
3 11.0 32.0
df['diff'] = (df1.Value - df1.Value1).fillna('-')
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
you can also do it this way:
In [76]: df['diff'] = (-df.groupby('Names')[['Value1','Value']].shift(1).diff(axis=1)['Value1']).fillna(0)
In [77]: df
Out[77]:
Names Value Value1 diff
0 Person A 5 8 0.0
1 Person A 13 11 -3.0
2 Person B 11 32 0.0
3 Person B 15 20 -21.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.