简体   繁体   English

如何使用熊猫创建多个直方图?

[英]How can I create multiple histograms with pandas?

I have a csv file with three columns: Full name, Test_A_Score, Test_B_Score. 我有一个包含三列的csv文件:全名,Test_A_Score,Test_B_Score。 Test_A_Score and Test_B_Score range from 0-10. Test_A_Score和Test_B_Score的范围是0-10。 My aim is for every unique value of Test_A_Score to create a histogram from the values of Test_B_Score. 我的目标是让Test_A_Score的每个唯一值从Test_B_Score的值创建直方图。

test_scores.csv

Full name      Test_A_Score Test_B_Score
Jake Johnson        5            8
Helen Smith         9            6
   .
   .
   .
Jonathan Pierce     3            8

My code so far: 到目前为止,我的代码:

import pandas as pd

df = pd.read_csv('test_scores.csv', delimiter=',',  na_values=['-']) 

# Get rid of missing scores
df = df[(df['Test_A_Score'] >= 0) & (df['Test_B_Score'] >= 0)]

score_range = range(11)

data = []
for score in score_range:
    scores = df[(df['Test_A_Score'] == score)]['Test_B_Score']
    data.append(scores)

df_hist = pd.DataFrame(data, columns=score_range)

So, I thought I would take the test B scores for the score_range, create a new dataframe, insert the data and plot the histograms of the columns with the following: 因此,我以为我要对score_range进行测试B分数,创建一个新的数据框,插入数据并使用以下内容绘制列的直方图:

import matplotlib.pyplot as plt

plt.figure()
scores_df.hist(color='k', alpha=0.5, bins=20)

The problems are that the scores for each value in score_range don't have the same length and the data need to be inserted as rows and not as columns like they are in the list named data. 问题在于,score_range中每个值的得分长度不同,并且数据需要插入行而不是列,就像它们在名为data的列表中一样。

first of all you should probably use the .dropna() function to get rid of non-sensible values. 首先,您可能应该使用.dropna()函数来摆脱不合理的值。 Next I think the groupby() function is your friend if you look for 'uniqueness'. 接下来,如果您要寻找“独特性”,那么我认为groupby()函数是您的朋友。

import pandas as pd
import matplotlib.pyplot as plt

frame = pd.DataFrame([['euler', 1, 3],
['gauss', 1, 5],
['fibo', 1, 6],
['schwartz', 2, 3],
['helmholtz', 2, 4],
['mandelbrodt', 3, 4]], columns=['Name','a','b'])

fig = plt.figure()
ax = [fig.add_subplot(1,3, i) for i in range(1,4)]

for index, (a, group) in enumerate(frame.groupby('a')):
    ax[index].hist(group.b.values)

the .groupby() returns you an iterator that gives you the groups name and the group itself. .groupby()返回一个迭代器,该迭代器为您提供组名和组本身。 You can then just plot a histogram of the b-values for every group. 然后,您可以为每个组绘制b值的直方图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM