简体   繁体   English

如何在 Pandas 中分组后绘制汇总结果?

[英]How to plot aggregate results after groupby in Pandas?

I've recently started learning Pandas and I'm having some trouble on how to plot results after using groupby and agg .我最近开始学习 Pandas,但在使用groupbyagg后如何绘制结果时遇到了一些麻烦。 Using Pandas, I have created a data frame and grouped it based on two columns 'ID' and 'x'.使用 Pandas,我创建了一个数据框,并根据两列“ID”和“x”对其进行分组。 Then I selected one specific column ('results') from the group to calculate the sem and mean.然后我从组中选择了一个特定的列(“结果”)来计算 sem 和平均值。

Specifically, the code:具体来说,代码:

df = pd.read_csv('pandas_2015-11-7.csv')              
df_group = df.groupby(['x','ID'])['results']          
df_group_results = df_group.agg([stats.sem, np.mean]) 

The results look like the following:结果如下所示:

            sem      mean
x    ID                    
2.5  0     0.010606  0.226674
     1     0.000369  0.490820
     2     0.000508  0.494094
5.0  0     0.001672  0.005059
     1     0.012252  0.190962
     2     0.003696  0.170342
7.5  0     0.001630  0.004506
     1     0.002567  0.016109
     2     0.002081  0.047301
10.0 0     0.000000  0.000000
     1     0.000000  0.000000
     2     0.000000  0.000000
12.5 0     0.000000  0.000000
     1     0.000000  0.000000
     2     0.000000  0.000000

My question is how do I make a line plot with error bars based on these results?我的问题是如何根据这些结果制作带有误差线的线图? The x-axis should be based on the 'x' value and 'ID' determines the lines (in this case 3 lines with legends of 0, 1, and 2). x 轴应基于 'x' 值,而 'ID' 确定行(在本例中为 3 行,图例分别为 0、1 和 2)。 The desired plot that I want to achieve is like this我想要实现的理想情节是这样的阴谋
(source: matplotlib.org ) (来源: matplotlib.org
. .

The groupby() method returns a hierarchical index (multi-index): groupby() 方法返回一个分层索引(多索引):

http://pandas.pydata.org/pandas-docs/stable/advanced.html http://pandas.pydata.org/pandas-docs/stable/advanced.html

If I create a df with a similar hierarchical index:如果我创建一个具有类似分层索引的 df:

import pandas as pd
df = pd.DataFrame({'mean':[0.5,0.25,0.7,0.8],'sem':[0.1,0.1,0.1,0.2]})
df.index = pd.MultiIndex(levels=[[2.5,5.0],[0,1]],labels=[[0,0,1,1],[0,1,0,1]],names=['x','ID'])

Then I have the following df:然后我有以下df:

        mean  sem
x   ID           
2.5 0   0.50  0.1
    1   0.25  0.1
5.0 0   0.70  0.1
    1   0.80  0.2

I can grab the relevant information from the multi-index, and use it to select and plot the correct rows in sequence:我可以从多索引中获取相关信息,并使用它来按顺序选择和绘制正确的行:

x_values = df.index.levels[0]
ID_values = df.index.levels[1]

for ID in ID_values:
    mean_data = df.loc[[(x,ID) for x in x_values],'mean'].tolist()
    error_data = df.loc[[(x,ID) for x in x_values],'sem'].tolist()
    matplotlib.pyplot.errorbar(x_values,mean_data,yerr=error_data)

legend(ID_values)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM