绘制组与组中变量的平均值

Question

I have a CSV file in the format of: 我有一个CSV文件，格式为：

BUFFER_SIZE,RUN,DURATION
1000,1,0.5
1000,2,0.62
1000,3,0.48
1000,4,0.59
2000,1,0.44
2000,2,0.35
2000,3,0.29
2000,4,0.41
...

(the data is fake, it's just to illustrate my example) （数据是伪造的，只是为了说明我的示例）

I want to plot buffer_size vs mean(duration) . 我想绘制buffer_size vs mean(duration) 。

I can group and compute means without a problem: 我可以对平均值进行分组和计算：

bench_results = pd.read_csv('bench_results.csv')
bench_by_size = bench_results.groupby('BUFFER_SIZE')
bench_by_size.mean()

which gives me the expected results. 这给了我预期的结果。

plot(bench_results.groupby('BUFFER_SIZE').mean()['DURATION']) is almost what I want, except that I want the X-axis to be BUFFER_SIZE. plot(bench_results.groupby('BUFFER_SIZE').mean()['DURATION']) 几乎是我想要的，除了我希望X轴为BUFFER_SIZE。

This is ugly but gives what I want: 这很难看，但给出了我想要的：

Xvals = []
Yvals = []
for key, grp in bench_results.groupby(['BUFFER_SIZE']):
    Xvals.append(key)
    Yvals.append(mean(grp['DURATION']))
plot(Xvals, Yvals)

Is there a better way to do that? 有更好的方法吗？ I would like to avoid itearing over the GroupBy object. 我想避免访问GroupBy对象。

Answer 1

plt.plot(bench_by_size.mean()['DURATION']) should work. plt.plot(bench_by_size.mean()['DURATION'])应该可以工作。 For example, 例如，

import pandas as pd
import matplotlib.pyplot as plt

bench_results = pd.DataFrame(
    {'BUFFER_SIZE': [1000, 1000, 1000, 1000, 2000, 2000, 2000, 2000],
     'DURATION': [0.5, 0.62, 0.48, 0.59, 0.44, 0.35, 0.29, 0.41],
     'RUN': [1, 2, 3, 4, 1, 2, 3, 4]})

# bench_results = pd.read_csv('data')
bench_by_size = bench_results.groupby('BUFFER_SIZE')
means = bench_by_size.mean()
plt.plot(means['DURATION'], linestyle='-', marker='o', markersize=10)
plt.xlabel(means.index.name)
plt.ylabel('DURATION')
plt.show()

yields 产量

绘制组与组中变量的平均值

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-02-25 22:00:22

绘制组与组中变量的平均值

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-02-25 22:00:22

解决方案1
1 已采纳 2017-02-25 22:00:22