First I will explain what I wish to occur. I have a lot of arrays but say 3 as an example with different lengths. I want to get the average from comparing the arrays by each element.
A = [0,10,20]
B = [10,40,60,80]
C = [50,70]
Expected outcome = [20,40,40,80]
What I've tried is using zip_longest from itertools and using the mean function from statistics.
from itertools import zip_longest
from statistics import mean
outcome = [mean(n) for n in zip_longest(a, b, c, fillvalue=0)]
However as specified the fill value is 0 and so the outcome is not the one desired. Because of using the mean function I cannot set the fillvalue to None. Would I have to use a different function to calculate the mean? Or another method to get an element wise average of different lengthen arrays.
Edit: Apologies but forgot to talk about the origins of the arrays. So the arrays are from a pandas dataframe where in each row of a column the value is an array of x length.
Edit2: Adding more meaningfull data
Create dataframe using pandas of a csv file
Select portion of dataframe with 2 conditions
Try to get element wise average from 3rd column that satisfies the 2 conditions
df = pd.read_csv('data.csv')
sec1 = df[(df['Color'] == 'blue') & (df['Type'] == 21)
outcome = [np.nanmean(n) for n in zip_longest(sec1['time'], fillvalue=float("nan"))]
print(outcome)
Where sec1['time'] has the output where the arrays are different lengths
2168 [0, 10, 20, 29, 44, 47, 59, 71, 94, 198...
2169 [0, 0, 7, 12, 47, 84, 144, 163, 222...
...
One approach, is to use nan
as fillvalue and filter out (using filterfalse
) the values when computing the mean, as below:
from itertools import zip_longest, filterfalse
from statistics import mean
from math import isnan
a = [0, 10, 20]
b = [10, 40, 60, 80]
c = [50, 70]
outcome = [mean(filterfalse(isnan, n)) for n in zip_longest(a, b, c, fillvalue=float("nan"))]
print(outcome)
Output
[20, 40, 40, 80]
I suggest you use fmean
:
outcome = [fmean(filterfalse(isnan, n)) for n in zip_longest(a, b, c, fillvalue=float("nan"))]
print(outcome)
is faster than mean
, from the documentation:
This runs faster than the mean() function and it always returns a float. The data may be a sequence or iterable. If the input dataset is empty, raises a StatisticsError.
Another alternative is to use numpy nanmean
:
from itertools import zip_longest
import numpy as np
a = [0, 10, 20]
b = [10, 40, 60, 80]
c = [50, 70]
outcome = [np.nanmean(n) for n in zip_longest(a, b, c, fillvalue=float("nan"))]
print(outcome)
Output
[20.0, 40.0, 40.0, 80.0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.