简体   繁体   English

使用 matplotlib 在单个图表上绘制两个直方图

[英]Plot two histograms on single chart with matplotlib

I created a histogram plot using data from a file and no problem.我使用文件中的数据创建了一个直方图,没有问题。 Now I wanted to superpose data from another file in the same histogram, so I do something like this现在我想在同一个直方图中叠加来自另一个文件的数据,所以我做这样的事情

n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)

but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden.但问题是,对于每个区间,只有最高值的条出现,而其他的则隐藏。 I wonder how could I plot both histograms at the same time with different colors.我想知道如何用不同的颜色同时绘制两个直方图。

Here you have a working example:这里有一个工作示例:

import random
import numpy
from matplotlib import pyplot

x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

bins = numpy.linspace(-10, 10, 100)

pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()

在此处输入图片说明

The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:接受的答案给出了带有重叠条的直方图的代码,但如果您希望每个条并排(就像我所做的那样),请尝试以下变体:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

直方图

Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html参考: http : //matplotlib.org/examples/statistics/histogram_demo_multihist.html

EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by @stochastic_zeitgeist编辑 [2018/03/16]:更新为允许绘制不同大小的数组,如@stochastic_zeitgeist 所建议的

In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis.如果您有不同的样本量,可能很难将分布与单个 y 轴进行比较。 For example:例如:

import numpy as np
import matplotlib.pyplot as plt

#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']

#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()

hist_single_ax

In this case, you can plot your two data sets on different axes.在这种情况下,您可以在不同的轴上绘制两个数据集。 To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):为此,您可以使用 matplotlib 获取直方图数据,清除轴,然后在两个单独的轴上重新绘制它(移动 bin 边缘,使它们不重叠):

#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis

#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])

#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()

hist_twin_ax

As a completion to Gustavo Bezerra's answer :作为Gustavo Bezerra 回答的补充

If you want each histogram to be normalized ( normed for mpl<=2.1 and density for mpl>=3.1 ) you cannot just use normed/density=True , you need to set the weights for each value instead:如果你想每个直方图进行归一化normed为MPL <= 2.1和density为MPL> = 3.1),你不能只用normed/density=True ,你需要为每个代替值的权重:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

在此处输入图片说明

As a comparison, the exact same x and y vectors with default weights and density=True :作为比较,具有默认权重和density=True的完全相同的xy向量:

在此处输入图片说明

You should use bins from the values returned by hist :您应该使用hist返回的值中的bins

import numpy as np
import matplotlib.pyplot as plt

foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution

_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)

两个具有相同分箱的 matplotlib 直方图

Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:这是一个简单的方法,当数据具有不同的大小时,在同一个图上绘制两个直方图,它们的条并排:

def plotHistogram(p, o):
    """
    p and o are iterables with the values you want to 
    plot the histogram of
    """
    plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
    plt.show()

Also an option which is quite similar to joaquin answer:还有一个与 joaquin 答案非常相似的选项:

import random
from matplotlib import pyplot

#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()

Gives the following output:给出以下输出:

在此处输入图片说明

Plotting two overlapping histograms (or more) can lead to a rather cluttered plot.绘制两个重叠的直方图(或更多)会导致绘图相当混乱。 I find that using step histograms (aka hollow histograms) improves the readability quite a bit.我发现使用阶梯直方图(又名空心直方图)可以大大提高可读性。 The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:唯一的缺点是在 matplotlib 中,步骤直方图的默认图例格式不正确,因此可以像以下示例中那样对其进行编辑:

import numpy as np                   # v 1.19.2
import matplotlib.pyplot as plt      # v 3.3.2
from matplotlib.lines import Line2D

rng = np.random.default_rng(seed=123)

# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)

# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
        alpha=0.7, label=['data1','data2'])

# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
    leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
                                alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)

# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()

step_hist

As you can see, the result looks quite clean.如您所见,结果看起来很干净。 This is especially useful when overlapping even more than two histograms.当重叠两个以上的直方图时,这尤其有用。 Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions.根据变量的分布方式,这最多可用于大约 5 个重叠分布。 More than that would require the use of another type of plot, such as one of those presented here .不仅如此,还需要使用另一种类型的绘图,例如 此处介绍的一种。

Just in case you have pandas ( import pandas as pd ) or are ok with using it:以防万一您有熊猫( import pandas as pd )或者可以使用它:

test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)], 
                     [random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()

There is one caveat when you want to plot the histogram from a 2-d numpy array.当您想从二维 numpy 数组绘制直方图时,有一个警告。 You need to swap the 2 axes.您需要交换 2 个轴。

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()

在此处输入图片说明

This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.这个问题之前已经回答过,但想添加另一个快速/简单的解决方法,可能会帮助其他访问者解决这个问题。

import seasborn as sns 
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)

Some helpful examples are here for kde vs histogram comparison.这里有一些有用的例子,用于 kde 与直方图的比较。

Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:受到所罗门回答的启发,但要坚持与直方图相关的问题,一个干净的解决方案是:

sns.distplot(bar)
sns.distplot(foo)
plt.show()

Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.确保首先绘制较高的直方图,否则您需要设置 plt.ylim(0,0.45) 以便不会切断较高的直方图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM