I can solve this problem, but not in a pythonic way. Given the following dataframe:
time rssi key1 key2 CMA
0 0.021 -71 P A NaN
1 0.022 -60 Q A NaN
2 0.025 -56 P B NaN
3 0.12 -70 Q B NaN
4 0.167 -65 P A NaN
5 0.210 -55 P B NaN
6 0.211 -74 Q A NaN
7 0.213 -62 Q B NaN
...
compute the cumulative moving average (CMA) of RSSI row by row, put the value in the column RSSI average. Iterate over increasing time, but group by key1
, key2
. This is equivalent to say that four CMA shall be computed: (P,A)
, (P,B)
, (Q,A)
, (Q,B)
. Finally, the CMA computed shall be put in the CMA column.
Note 1 : I know RSSI average is not to be computed with this formula, I don't care about it.
Note 2 : CMA formula is avg(n) = (avg(n-1) * (n-1) + value(n))/n
Example 1:
defines the groupby()
strategy.
time rssi key1 key2 CMA
0 0.021 -71 P A NaN <<-- first value can stay NaN or be default to rssi (i.e. -71)
4 0.167 -65 P A -68
...
Example 2:
desired output
time rssi key1 key2 CMA
0 0.021 -71 P A NaN
1 0.022 -60 Q A NaN
2 0.025 -56 P B NaN
3 0.12 -70 Q B NaN
4 0.167 -65 P A -68
5 0.210 -55 P B -55.5
6 0.211 -74 Q A -67
7 0.213 -62 Q B -66
...
So far, this is what I can come up with
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['time'] = [0.021,0.022,0.025,0.12,0.167,0.210,0.211,0.213]
df['rssi'] = [-71,-60,-56,-70,-65,-55,-74,-62]
df['key1'] = ['P','Q','P','Q','P','P','Q','Q']
df['key2'] = ['A','A','B','B','A','B','A','B']
df["CMA"] = np.nan
for key, grp in df.groupby(['key1', 'key2']):
i = 0
old_index = 0
for index, row in grp.iterrows():
if i == 0:
# allowed alternative
df.at[index,'CMA'] = grp.at[index,'rssi']
old_index = index
else:
df.at[index,'CMA'] = ((df.at[old_index,'CMA'] * i) + df.at[index,'rssi']) / (i+1)
old_index = index
i += 1
print df
works, but it's ugly. There must be a not-so-painful way to achieve the same in a more pythonic fashion. How can I improve this without explicitly setting each cell value for that column?
You can do groupby().expanding().mean()
with a reset_index
:
df['CMA'] = (df.groupby(['key1','key2'],
as_index=False)['rssi']
.expanding(min_periods=2).mean()
.reset_index(level=0, drop=True)
)
Output:
time rssi key1 key2 CMA
0 0.021 -71 P A NaN
1 0.022 -60 Q A NaN
2 0.025 -56 P B NaN
3 0.120 -70 Q B NaN
4 0.167 -65 P A -68.0
5 0.210 -55 P B -55.5
6 0.211 -74 Q A -67.0
7 0.213 -62 Q B -66.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.