I can't figure out a problem I am trying to solve. I have a pandas data frame coming from this:
date, id, measure, result
2016-07-11, 31, "[2, 5, 3, 3]", 1
2016-07-12, 32, "[3, 5, 3, 3]", 1
2016-07-13, 33, "[2, 1, 2, 2]", 1
2016-07-14, 34, "[2, 6, 3, 3]", 1
2016-07-15, 35, "[39, 31, 73, 34]", 0
2016-07-16, 36, "[3, 2, 3, 3]", 1
2016-07-17, 37, "[3, 8, 3, 3]", 1
Measurements column consists of arrays in string format.
I want to have a new moving-average-array
column from the past 3 measurement records, excluding those records where the result
is 0. Past 3 records mean that for id
34, the arrays of id
31,32,33 to be used.
It is about taking the average of every 1st point, 2nd point, 3rd and 4th point to have this moving-average-array
.
It is not about getting the average of 1st array, 2nd array ... and then averaging the average, no .
For the first 3 rows, because there is not enough history, I just want to use their own measurement. So the solution should look like this:
date, id, measure, result . Solution
2016-07-11, 31, "[2, 5, 3, 3]", 1, "[2, 5, 3, 3]"
2016-07-12, 32, "[3, 5, 3, 3]", 1, "[3, 5, 3, 3]"
2016-07-13, 33, "[2, 1, 2, 2]", 1, "[2, 1, 2, 2]"
2016-07-14, 34, "[2, 6, 3, 3]", 1, "[2.3, 3.6, 2.6, 2.6]"
2016-07-15, 35, "[39, 31, 73, 34]", 0, "[2.3, 4, 2.6, 2.6]"
2016-07-16, 36, "[3, 2, 3, 3]", 1, "[2.3, 4, 2.6, 2.6]"
2016-07-17, 37, "[3, 8, 3, 3]", 1, "[2.3, 3, 2.6, 2.6]"
The real data is bigger. result
0 may repeat 2 or more times after each other also. I think it will be about keeping a track of previous OK result
s properly getting those averages. I spent time but I could not.
I am posting the dataframe here:
mydict = {'date': {0: '2016-07-11',
1: '2016-07-12',
2: '2016-07-13',
3: '2016-07-14',
4: '2016-07-15',
5: '2016-07-16',
6: '2016-07-17'},
'id': {0: 31, 1: 32, 2: 33, 3: 34, 4: 35, 5: 36, 6: 37},
'measure': {0: '[2, 5, 3, 3]',
1: '[3, 5, 3, 3]',
2: '[2, 1, 2, 2]',
3: '[2, 6, 3, 3]',
4: '[39, 31, 73, 34]',
5: '[3, 2, 3, 3]',
6: '[3, 8, 3, 3]'},
'result': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 1, 6: 1}}
df = pd.DataFrame(mydict)
Thank you for giving directions or pointing out how to.
Solution using only 1 for loop:
Considering the data:
mydict = {'date': {0: '2016-07-11',
1: '2016-07-12',
2: '2016-07-13',
3: '2016-07-14',
4: '2016-07-15',
5: '2016-07-16',
6: '2016-07-17'},
'id': {0: 31, 1: 32, 2: 33, 3: 34, 4: 35, 5: 36, 6: 37},
'measure': {0: '[2, 5, 3, 3]',
1: '[3, 5, 3, 3]',
2: '[2, 1, 2, 2]',
3: '[2, 6, 3, 3]',
4: '[39, 31, 73, 34]',
5: '[3, 2, 3, 3]',
6: '[3, 8, 3, 3]'},
'result': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 1, 6: 1}}
df = pd.DataFrame(mydict)
I defined a simple function to calculate the means and return a list. Then, loop the dataframe applying the rules:
def calc_mean(in_list):
p0 = round((in_list[0][0] + in_list[1][0] + in_list[2][0])/3,1)
p1 = round((in_list[0][1] + in_list[1][1] + in_list[2][1])/3,1)
p2 = round((in_list[0][2] + in_list[1][2] + in_list[2][2])/3,1)
p3 = round((in_list[0][3] + in_list[1][3] + in_list[2][3])/3,1)
return [p0, p1, p2, p3]
Solution = []
aux_list = []
for index, row in df.iterrows():
if index in [0,1,2]:
Solution.append(row.measure)
aux_list.append([int(x) for x in row.measure[1:-1].split(', ')])
else:
Solution.append('[' +', '.join(map(str, calc_mean(aux_list))) + ']')
if row.result > 0:
aux_list.pop(0)
aux_list.append([int(x) for x in row.measure[1:-1].split(', ')])
df['Solution'] = Solution
The output is:
Please note that the result is rounded to 1 decimal place, a bit different from your desired output. Made more sense to me.
EDIT:
As a suggestion in the comments by @Frenchy, to deal with result == 0 in the first 3 rows, we need to change a bit the first if clause:
if index in [0,1,2] or len(aux_list) <3:
Solution.append(row.measure)
if row.result > 0:
aux_list.append([int(x) for x in row.measure[1:-1].split(', ')])
You can use pd.eval
to change from a str
of list
to a proper list
only the part of the data in measure
where result
is not 0. Use rolling
with mean
and then shift
to get the rolling average over the last 3 rows at the next row. Then map
to str
once your dataframe is changed to a list of list with values
and tolist
. Finally you just need to replace the first three rows and ffill
the missing data:
df.loc[df.result.shift() != 0,'solution'] = list(map(str,
pd.DataFrame(pd.eval(df[df.result != 0].measure))
.rolling(3).mean().shift().values.tolist()))
df.loc[:2,'solution'] = df.loc[:2,'measure']
df.solution = df.solution.ffill()
Here's another solution:
# get data to reproduce example
from io import StringIO
data = StringIO("""
date;id;measure;result
2016-07-11;31;"[2,5,3,3]";1
2016-07-12;32;"[3,5,3,3]";1
2016-07-13;33;"[2,1,2,2]";1
2016-07-14;34;"[2,6,3,3]";1
2016-07-15;35;"[39,31,73,34]";0
2016-07-16;36;"[3,2,3,3]";1
2016-07-17;37;"[3,8,3,3]";1
""")
df = pd.read_csv(data, sep=";")
df
# Out:
# date id measure result
# 0 2016-07-11 31 [2,5,3,3] 1
# 1 2016-07-12 32 [3,5,3,3] 1
# 2 2016-07-13 33 [2,1,2,2] 1
# 3 2016-07-14 34 [2,6,3,3] 1
# 4 2016-07-15 35 [39,31,73,34] 0
# 5 2016-07-16 36 [3,2,3,3] 1
# 6 2016-07-17 37 [3,8,3,3] 1
# convert values in measure column to lists
from ast import literal_eval
dm = df['measure'].apply(literal_eval)
# apply rolling mean with period 2 and recollect values into list in column means
df["means"] = dm.apply(pd.Series).rolling(2, min_periods=0).mean().values.tolist()
df
# Out:
# date id measure result means
# 0 2016-07-11 31 [2,5,3,3] 1 [2.0, 5.0, 3.0, 3.0]
# 1 2016-07-12 32 [3,5,3,3] 1 [2.5, 5.0, 3.0, 3.0]
# 2 2016-07-13 33 [2,1,2,2] 1 [2.5, 3.0, 2.5, 2.5]
# 3 2016-07-14 34 [2,6,3,3] 1 [2.0, 3.5, 2.5, 2.5]
# 4 2016-07-15 35 [39,31,73,34] 0 [20.5, 18.5, 38.0, 18.5]
# 5 2016-07-16 36 [3,2,3,3] 1 [21.0, 16.5, 38.0, 18.5]
# 6 2016-07-17 37 [3,8,3,3] 1 [3.0, 5.0, 3.0, 3.0]
# moving window of size 3
df["means"] = dm.apply(pd.Series).rolling(3, min_periods=0).mean().round(2).values.tolist()
df
# Out:
# date id measure result means
# 0 2016-07-11 31 [2,5,3,3] 1 [2.0, 5.0, 3.0, 3.0]
# 1 2016-07-12 32 [3,5,3,3] 1 [2.5, 5.0, 3.0, 3.0]
# 2 2016-07-13 33 [2,1,2,2] 1 [2.33, 3.67, 2.67, 2.67]
# 3 2016-07-14 34 [2,6,3,3] 1 [2.33, 4.0, 2.67, 2.67]
# 4 2016-07-15 35 [39,31,73,34] 0 [14.33, 12.67, 26.0, 13.0]
# 5 2016-07-16 36 [3,2,3,3] 1 [14.67, 13.0, 26.33, 13.33]
# 6 2016-07-17 37 [3,8,3,3] 1 [15.0, 13.67, 26.33, 13.33]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.