[英]pandas: Calculate the difference from a grouped average
I have sensor data for multiple sensors by month and year: 我按月和年份有多个传感器的传感器数据:
import pandas as pd
df = pd.DataFrame([
['A', 'Jan', 2015, 13],
['A', 'Feb', 2015, 10],
['A', 'Jan', 2016, 12],
['A', 'Feb', 2016, 11],
['B', 'Jan', 2015, 7],
['B', 'Feb', 2015, 8],
['B', 'Jan', 2016, 4],
['B', 'Feb', 2016, 9]
], columns = ['sensor', 'month', 'year', 'value'])
In [2]: df
Out[2]:
sensor month year value
0 A Jan 2015 13
1 A Feb 2015 10
2 A Jan 2016 12
3 A Feb 2016 11
4 B Jan 2015 7
5 B Feb 2015 8
6 B Jan 2016 4
7 B Feb 2016 9
I calculated the average for each sensor and month with a groupby: 我用groupby计算了每个传感器和月份的平均值:
month_avg = df.groupby(['sensor', 'month']).mean()['value']
In [3]: month_avg
Out[3]:
sensor month
A Feb 10.5
Jan 12.5
B Feb 8.5
Jan 5.5
Now I want to add a column to df
with the difference from the monthly averages, something like this: 现在我想在
df
添加一个与月平均值不同的列,如下所示:
sensor month year value diff_from_avg
0 A Jan 2015 13 1.5
1 A Feb 2015 10 2.5
2 A Jan 2016 12 0.5
3 A Feb 2016 11 0.5
4 B Jan 2015 7 2.5
5 B Feb 2015 8 0.5
6 B Jan 2016 4 -1.5
7 B Feb 2016 9 -0.5
I tried multi-indexing df
and avgs_by_month
similarly and trying simple subtraction, but no good: 我尝试了类似的多索引
df
和avgs_by_month
并尝试简单的减法,但没有好处:
df = df.set_index(['sensor', 'month'])
df['diff_from_avg'] = month_avg - df.value
Thank you for any advice. 谢谢你的任何建议。
assign
new column with transform
使用
transform
assign
新列
diff_from_avg=df.value - df.groupby(['sensor', 'month']).value.transform('mean')
df.assign(diff_from_avg=diff_from_avg)
sensor month year value diff_from_avg
0 A Jan 2015 13 0.5
1 A Feb 2015 10 -0.5
2 A Jan 2016 12 -0.5
3 A Feb 2016 11 0.5
4 B Jan 2015 7 1.5
5 B Feb 2015 8 -0.5
6 B Jan 2016 4 -1.5
7 B Feb 2016 9 0.5
Try: 尝试:
df['diff_from_avg']=df.groupby(['sensor','month'])['value'].apply(lambda x: x-x.mean())
Out[18]:
sensor month year value diff_from_avg
0 A Jan 2015 13 0.5
1 A Feb 2015 10 -0.5
2 A Jan 2016 12 -0.5
3 A Feb 2016 11 0.5
4 B Jan 2015 7 1.5
5 B Feb 2015 8 -0.5
6 B Jan 2016 4 -1.5
7 B Feb 2016 9 0.5
您需要将DataFrame的索引设置为与分组系列一致,然后您可以直接减去:
df.set_index(['sensor','month'], inplace=True) df['diff'] = df['value'] - month_avg
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.