简体   繁体   English

如何比较数据框中的列

[英]how to compare columns in data frame

I'm trying to visually compare two columns in a data frame and it either makes a weird table with 'frequency' instead of one of the columns我试图直观地比较数据框中的两列,它要么用“频率”而不是其中一列制作一个奇怪的表

I tried these options:我尝试了这些选项:

ct1=pd.crosstab(df['releaseyear'],df['score'],normalize=True)
ct1.plot()

df.plot( x='releaseyear', y='score', kind='hist')

and also a scatter plot which get the x and y right but I don't know how normalize it so it will only show the average of each year and not all the data还有一个散点图 plot 得到 x 和 y 正确但我不知道如何标准化它所以它只会显示每年的平均值而不是所有数据

plt.scatter(df['releaseyear'], df['score'])
plt.show()

There is no proper data which can be used to reproduce the dataframe or clue about how dataframe looks.没有适当的数据可用于重现 dataframe 或关于 dataframe 外观的线索。

This answer is according to what i understood if data is like this这个答案是根据我的理解,如果数据是这样的

   year score
   2001 20
   2001 18
   2002 12
   2002 16

then first use groupby and group data according to year and apply required aggregate function.然后首先根据年份使用 groupby 和 group 数据并应用所需的聚合 function。

df=df.groupby('year').mean().reset_index()

output output

   year  score
0  2001   19.0
1  2002   14.0

you can then plot the data accordingly.然后,您可以相应地 plot 数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM