简体   繁体   English

计算多列熊猫数据框中的聚合值

[英]Calculating aggregate values in a pandas dataframe with multiple columns

I have a Pandas DataFrame with multiple columns. 我有一个带有多列的Pandas DataFrame。

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
print(df)

first        bar                 baz                 foo                 qux  \
second       one       two       one       two       one       two       one   
A      -0.093829 -0.159939 -0.386961 -0.367417  0.625646  1.286186  0.429855   
B       0.440266  0.345161  1.798363 -1.265215  0.204303 -1.492993 -1.714360   
C       0.689076 -1.211060 -0.265888  0.769467 -0.706941  0.086907 -0.892892 

first             
second       two  
A      -1.006210  
B      -0.275578  
C      -0.563757

I want to calculate the mean and standard deviation, of each column, grouping by the upper column. 我要计算每列的平均值和标准偏差,并按上列分组。 Once I have calculated the mean and standard deviation I want to double the columns in the lower level, adding to the column name the information related to the statistical operation (mean or standard deviation) as "column name" + "_" + "std/mean". 计算完平均值和标准偏差后,我想将较低级别的列加倍,将与统计操作有关的信息(均值或标准差)添加为“列名” +“ _” +“ std” /意思”。

group_cols = df.groupby(df.columns.get_level_values('first'), axis=1)
list_stat_dfs = []
for key, group in group_cols:
    group_descr = group.describe().loc[['mean', 'std'], :]  # Get mean and std from single site
    group_descr.loc[:, (key, 'stats')] = group_descr.index
    group_descr.loc[:, (key, 'first')] = key
    group_descr.columns = group_descr.columns.droplevel(0)  # Remove upper level column (site_name)
    group_descr = group_descr.pivot(columns='stats', index='first')  # Rows to columns
    col_prod = list(product(group_descr.columns.levels[0], group_descr.columns.levels[1]))
    cols = ['_'.join((col[0], col[1])) for col in col_prod]
    group_descr.columns = pd.MultiIndex.from_product(([key], cols))  # From multiple columns to single column
    group_descr.reset_index(inplace=True)
    list_stat_dfs.append(group_descr)

group_descr = pd.concat(list_stat_dfs, axis=1)
print(group_descr)

first       bar                              first       baz            \
         one_mean   one_std  two_mean  two_std        one_mean   one_std   
0   bar  0.507185  1.799053 -0.249692  1.41507   baz -0.147664  0.595927  

                     first       foo                               first  \
   two_mean   two_std        one_mean   one_std  two_mean   two_std         
0  0.160018  1.405113   foo -0.433644  1.245972  0.254995  0.846983   qux 

        qux                                
   one_mean   one_std  two_mean   two_std  
0  0.667629  0.315417 -0.757989  0.683273  

As you can see, I have been able to manage it with a for loop and some line of code. 如您所见,我已经能够使用for循环和一些代码行对其进行管理。 Can someone do the same thing in a more optimized way. 有人可以以更优化的方式做同样的事情。 I am quite sure that with Pandas, the same thing can be done with few lines of code. 我很确定,使用Pandas,只需几行代码就可以完成同一件事。

I think you need get mean and std of df , then concat together and reshape by unstack : 我想你需要得到meanstddf ,那么concat在一起,并通过重塑unstack

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))

np.random.seed(1000)
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
print(df)
first        bar                 baz                 foo                 qux  \
second       one       two       one       two       one       two       one   
A      -0.804458  0.320932 -0.025483  0.644324 -0.300797  0.389475 -0.107437   
B       0.595036 -0.464668  0.667281 -0.806116 -1.196070 -0.405960 -0.182377   
C      -0.138422  0.705692  1.271795 -0.986747 -0.334835 -0.099482  0.407192   

first             
second       two  
A      -0.479983  
B       0.103193  
C       0.919388  

df = pd.concat([df.mean(), df.std()], keys=('mean','std')).unstack(1)
df.index =  [[0] * len(df.index), ['_'.join((col[1], col[0])) for col in df.index]]
df = df.unstack()
print (df)
first       bar                                     baz                      \
       one_mean   one_std  two_mean   two_std  one_mean   one_std  two_mean   
0     -0.115948  0.700018  0.187319  0.596511  0.637865  0.649139 -0.382846   

first                 foo                                     qux           \
        two_std  one_mean   one_std  two_mean   two_std  one_mean  one_std   
0      0.894129 -0.610567  0.507346 -0.038656  0.401191  0.039126  0.32095   

first                      
       two_mean   two_std  
0      0.180866  0.702911  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame 使用多列聚合 function - Pandas DataFrame aggregate function using multiple columns Pandas 数据框,groupBy 聚合多列和多行 - Pandas dataframe, groupBy aggregate multiple columns and rows 如何将熊猫数据框列中的所有值聚合为 2 个值 - How to aggregate all values in a pandas dataframe columns in 2 values 将同名pandas数据框列的值聚合到单列 - Aggregate values of same name pandas dataframe columns to single column Pyspark Dataframe 将类别行值转换为多列聚合的列 - Pyspark Dataframe Convert category row values into columns with aggregate on multiple columns 计算 pandas dataframe 中多个 boolean 列的成对重叠 - Calculating pairwise overlap for multiple boolean columns in pandas dataframe pandas 将跨多个列的值计数汇总到汇总中 dataframe - pandas aggregate value counts across multiple columns into summary dataframe pandas dataframe resample聚合函数使用多个具有自定义函数的列? - pandas dataframe resample aggregate function use multiple columns with a customized function? 有没有更好的方法来聚合同一分组 pandas dataframe 上的多个列? - Is there any nicer way to aggregate multiple columns on same grouped pandas dataframe? Pandas 数据框在多个列和值上进行 dict 列出 - Pandas dataframe to dict on multiple columns and values to list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM