[英]Python/Pandas for solving grouped mean, median, mode and standard deviation
[英]Add mean, median and standard deviation values as new array columns in Python
我试图找到平均值、中位数和标准差,并将它们添加为以下数组中每个索引值的新列:
import pandas as pd
#create a dictionary "salesDict"
salesDict = {'Samsung Galaxy S10': [769.34, 834.23, 900.12, 1021.12],
'iPhone X': [983.11, 881.21, 1210.32, 1100.34],
'Google Pixel 4': [1021.18, 1321.12, 832.14, 992.15]}
#create a list called "dates"
dates=( '01/01/2020', '01/02/2020', '01/03/2020', '01/04/2020')
#create a dataframe called "sales" with "dates" as index
sales=pd.DataFrame(salesDict,index=dates)
print(sales)
#create a Mean column that contains mean value for each date
sales["Mean"]=sales.mean(axis = 1)
#create a Median column that contains median value for each date
sales["Median"]=sales.median(axis = 1)
#create a Std column that contains the standard deviation for each date
sales["Std"]=sales.std(axis=1)
sales.drop(['Samsung Galaxy S10', 'iPhone X', 'Google Pixel 4'], axis=1, inplace=True)
print(sales)
结果看起来像这样:
Samsung Galaxy S10 iPhone X Google Pixel 4
01/01/2020 769.34 983.11 1021.18
01/02/2020 834.23 881.21 1321.12
01/03/2020 900.12 1210.32 832.14
01/04/2020 1021.12 1100.34 992.15
Mean Median Std
01/01/2020 924.543333 953.826667 96.879804
01/02/2020 1012.186667 946.698333 192.155044
01/03/2020 980.860000 940.490000 143.694352
01/04/2020 1037.870000 1029.495000 39.779060
结果我只得到了正确的平均值,其他两列的值是错误的。 谁能指导我解决这个问题,因为我只是 Python 的空白纸。 非常感谢!!
出现问题是因为中值受到新计算的平均值的影响; 标准偏差受到新计算的平均值和中位数的影响。
为了避免您仅通过选择项目列来计算(平均值)中位数和标准差:sales[["Samsung Galaxy S10", "iPhone X", "Google Pixel 4"]]。
这些更改应该更正您的结果:
sales["Mean"]=sales[["Samsung Galaxy S10", "iPhone X", "Google Pixel 4"]].mean(axis = 1)
sales["Median"]=sales[["Samsung Galaxy S10", "iPhone X", "Google Pixel 4"]].median(axis = 1)
sales["Std"]=sales[["Samsung Galaxy S10", "iPhone X", "Google Pixel 4"]].std(axis = 1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.