[英]How can I create multiple histograms with pandas?
I have a csv file with three columns: Full name, Test_A_Score, Test_B_Score. 我有一个包含三列的csv文件:全名,Test_A_Score,Test_B_Score。 Test_A_Score and Test_B_Score range from 0-10.
Test_A_Score和Test_B_Score的范围是0-10。 My aim is for every unique value of Test_A_Score to create a histogram from the values of Test_B_Score.
我的目标是让Test_A_Score的每个唯一值从Test_B_Score的值创建直方图。
test_scores.csv
Full name Test_A_Score Test_B_Score
Jake Johnson 5 8
Helen Smith 9 6
.
.
.
Jonathan Pierce 3 8
My code so far: 到目前为止,我的代码:
import pandas as pd
df = pd.read_csv('test_scores.csv', delimiter=',', na_values=['-'])
# Get rid of missing scores
df = df[(df['Test_A_Score'] >= 0) & (df['Test_B_Score'] >= 0)]
score_range = range(11)
data = []
for score in score_range:
scores = df[(df['Test_A_Score'] == score)]['Test_B_Score']
data.append(scores)
df_hist = pd.DataFrame(data, columns=score_range)
So, I thought I would take the test B scores for the score_range, create a new dataframe, insert the data and plot the histograms of the columns with the following: 因此,我以为我要对score_range进行测试B分数,创建一个新的数据框,插入数据并使用以下内容绘制列的直方图:
import matplotlib.pyplot as plt
plt.figure()
scores_df.hist(color='k', alpha=0.5, bins=20)
The problems are that the scores for each value in score_range don't have the same length and the data need to be inserted as rows and not as columns like they are in the list named data. 问题在于,score_range中每个值的得分长度不同,并且数据需要插入行而不是列,就像它们在名为data的列表中一样。
first of all you should probably use the .dropna()
function to get rid of non-sensible values. 首先,您可能应该使用
.dropna()
函数来摆脱不合理的值。 Next I think the groupby()
function is your friend if you look for 'uniqueness'. 接下来,如果您要寻找“独特性”,那么我认为
groupby()
函数是您的朋友。
import pandas as pd
import matplotlib.pyplot as plt
frame = pd.DataFrame([['euler', 1, 3],
['gauss', 1, 5],
['fibo', 1, 6],
['schwartz', 2, 3],
['helmholtz', 2, 4],
['mandelbrodt', 3, 4]], columns=['Name','a','b'])
fig = plt.figure()
ax = [fig.add_subplot(1,3, i) for i in range(1,4)]
for index, (a, group) in enumerate(frame.groupby('a')):
ax[index].hist(group.b.values)
the .groupby()
returns you an iterator that gives you the groups name and the group itself. .groupby()
返回一个迭代器,该迭代器为您提供组名和组本身。 You can then just plot a histogram of the b-values for every group. 然后,您可以为每个组绘制b值的直方图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.