简体   繁体   English

使用matplotlib在一张图中绘制具有不同数据点长度的2个直方图

[英]Plot 2 histograms with different length of data points in one graph using matplotlib

I have two set of data with one containing around 11 million data points and the another around 5000. I would like to plot them both on one histogram. 我有两组数据,一组包含大约1100万个数据点,另一组包含大约5000个数据点。我想将它们都绘制在一个直方图上。 But because of the difference in size I need to normalise the frequency so I can plot them on the same figure. 但是由于大小的差异,我需要对频率进行归一化,以便可以将它们绘制在同一图形上。 Below I have simulated what I have done with my data to be able to plot them. 下面我模拟了我对数据所做的工作,以便能够绘制它们。 I have used the normed=True. 我用过normed = True。

from numpy.random import randn
import matplotlib.pyplot as plt
import random

datalist1=[]
for x in range(1,50000):
  datalist1.append(random.uniform(1,2))

datalist2=randn(5000000)


fig= plt.figure(1)

plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
plt.xlabel("Value")
plt.ylabel("Normalised Frequency")
plt.legend()
plt.show()

在此处输入图片说明

Can you please tell me if this is a good way to get around this issue? 您能告诉我这是否是解决此问题的好方法吗? I would like to match the tallest hight between the two histogram frequencies to be 1 (or 100%). 我想将两个直方图频率之间的最高高度匹配为1(或100%)。

The normed=True setting normalizes the histogram to an area of 1. That gives the histogram an interpretation as estimates of probability density functions. normed=True设置将直方图归一化为1的区域 。这使直方图可以解释为概率密度函数的估计。

In short, it actually makes sense not to normalize on the peak but on the area. 简而言之,实际上不对峰进行归一化而是对区域进行归一化是有意义的。

But if you really want to normalize by height you can modify the polygon data of the histogram: 但是,如果您真的想通过高度进行归一化,则可以修改直方图的面数据:

h = plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
h = plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()

This solution feels a bit hackish, but at least it's quick and dirty :) 这个解决方案有点让人讨厌,但至少它又快又脏:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM