[英]Plot group vs. mean of variable in group
我有一個CSV文件,格式為:
BUFFER_SIZE,RUN,DURATION
1000,1,0.5
1000,2,0.62
1000,3,0.48
1000,4,0.59
2000,1,0.44
2000,2,0.35
2000,3,0.29
2000,4,0.41
...
(數據是偽造的,只是為了說明我的示例)
我想繪制buffer_size
vs mean(duration)
。
我可以對平均值進行分組和計算:
bench_results = pd.read_csv('bench_results.csv')
bench_by_size = bench_results.groupby('BUFFER_SIZE')
bench_by_size.mean()
這給了我預期的結果。
plot(bench_results.groupby('BUFFER_SIZE').mean()['DURATION'])
幾乎是我想要的,除了我希望X軸為BUFFER_SIZE。
這很難看,但給出了我想要的:
Xvals = []
Yvals = []
for key, grp in bench_results.groupby(['BUFFER_SIZE']):
Xvals.append(key)
Yvals.append(mean(grp['DURATION']))
plot(Xvals, Yvals)
有更好的方法嗎? 我想避免訪問GroupBy對象。
plt.plot(bench_by_size.mean()['DURATION'])
應該可以工作。 例如,
import pandas as pd
import matplotlib.pyplot as plt
bench_results = pd.DataFrame(
{'BUFFER_SIZE': [1000, 1000, 1000, 1000, 2000, 2000, 2000, 2000],
'DURATION': [0.5, 0.62, 0.48, 0.59, 0.44, 0.35, 0.29, 0.41],
'RUN': [1, 2, 3, 4, 1, 2, 3, 4]})
# bench_results = pd.read_csv('data')
bench_by_size = bench_results.groupby('BUFFER_SIZE')
means = bench_by_size.mean()
plt.plot(means['DURATION'], linestyle='-', marker='o', markersize=10)
plt.xlabel(means.index.name)
plt.ylabel('DURATION')
plt.show()
產量
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.