使用 matplotlib 在单个图表上绘制两个直方图

Question

I created a histogram plot using data from a file and no problem.我使用文件中的数据创建了一个直方图，没有问题。 Now I wanted to superpose data from another file in the same histogram, so I do something like this现在我想在同一个直方图中叠加来自另一个文件的数据，所以我做这样的事情

n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)

but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden.但问题是，对于每个区间，只有最高值的条出现，而其他的则隐藏。 I wonder how could I plot both histograms at the same time with different colors.我想知道如何用不同的颜色同时绘制两个直方图。

Answer 1

Here you have a working example:这里有一个工作示例：

import random
import numpy
from matplotlib import pyplot

x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

bins = numpy.linspace(-10, 10, 100)

pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()

在此处输入图片说明

Answer 2

The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:接受的答案给出了带有重叠条的直方图的代码，但如果您希望每个条并排（就像我所做的那样），请尝试以下变体：

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html参考： http : //matplotlib.org/examples/statistics/histogram_demo_multihist.html

EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by @stochastic_zeitgeist编辑 [2018/03/16]：更新为允许绘制不同大小的数组，如@stochastic_zeitgeist 所建议的

Answer 3

In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis.如果您有不同的样本量，可能很难将分布与单个 y 轴进行比较。 For example:例如：

import numpy as np
import matplotlib.pyplot as plt

#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']

#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()

In this case, you can plot your two data sets on different axes.在这种情况下，您可以在不同的轴上绘制两个数据集。 To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don't overlap):为此，您可以使用 matplotlib 获取直方图数据，清除轴，然后在两个单独的轴上重新绘制它（移动 bin 边缘，使它们不重叠）：

#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis

#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])

#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()

Answer 4

As a completion to Gustavo Bezerra's answer :作为Gustavo Bezerra 回答的补充：

If you want each histogram to be normalized ( normed for mpl<=2.1 and density for mpl>=3.1 ) you cannot just use normed/density=True , you need to set the weights for each value instead:如果你想每个直方图进行归一化（ normed为MPL <= 2.1和density为MPL> = 3.1），你不能只用normed/density=True ，你需要为每个代替值的权重：

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

As a comparison, the exact same x and y vectors with default weights and density=True :作为比较，具有默认权重和density=True的完全相同的x和y向量：

Answer 5

You should use bins from the values returned by hist :您应该使用hist返回的值中的bins ：

import numpy as np
import matplotlib.pyplot as plt

foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution

_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)

Answer 6

Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:这是一个简单的方法，当数据具有不同的大小时，在同一个图上绘制两个直方图，它们的条并排：

def plotHistogram(p, o):
    """
    p and o are iterables with the values you want to 
    plot the histogram of
    """
    plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
    plt.show()

Answer 7

Also an option which is quite similar to joaquin answer:还有一个与 joaquin 答案非常相似的选项：

import random
from matplotlib import pyplot

#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()

Gives the following output:给出以下输出：

Answer 8

Plotting two overlapping histograms (or more) can lead to a rather cluttered plot.绘制两个重叠的直方图（或更多）会导致绘图相当混乱。 I find that using step histograms (aka hollow histograms) improves the readability quite a bit.我发现使用阶梯直方图（又名空心直方图）可以大大提高可读性。 The only downside is that in matplotlib the default legend for a step histogram is not properly formatted, so it can be edited like in the following example:唯一的缺点是在 matplotlib 中，步骤直方图的默认图例格式不正确，因此可以像以下示例中那样对其进行编辑：

import numpy as np                   # v 1.19.2
import matplotlib.pyplot as plt      # v 3.3.2
from matplotlib.lines import Line2D

rng = np.random.default_rng(seed=123)

# Create two normally distributed random variables of different sizes
# and with different shapes
data1 = rng.normal(loc=30, scale=10, size=500)
data2 = rng.normal(loc=50, scale=10, size=1000)

# Create figure with 'step' type of histogram to improve plot readability
fig, ax = plt.subplots(figsize=(9,5))
ax.hist([data1, data2], bins=15, histtype='step', linewidth=2,
        alpha=0.7, label=['data1','data2'])

# Edit legend to get lines as legend keys instead of the default polygons
# and sort the legend entries in alphanumeric order
handles, labels = ax.get_legend_handles_labels()
leg_entries = {}
for h, label in zip(handles, labels):
    leg_entries[label] = Line2D([0], [0], color=h.get_facecolor()[:-1],
                                alpha=h.get_alpha(), lw=h.get_linewidth())
labels_sorted, lines = zip(*sorted(leg_entries.items()))
ax.legend(lines, labels_sorted, frameon=False)

# Remove spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Add annotations
plt.ylabel('Frequency', labelpad=15)
plt.title('Matplotlib step histogram', fontsize=14, pad=20)
plt.show()

As you can see, the result looks quite clean.如您所见，结果看起来很干净。 This is especially useful when overlapping even more than two histograms.当重叠两个以上的直方图时，这尤其有用。 Depending on how the variables are distributed, this can work for up to around 5 overlapping distributions.根据变量的分布方式，这最多可用于大约 5 个重叠分布。 More than that would require the use of another type of plot, such as one of those presented here .不仅如此，还需要使用另一种类型的绘图，例如此处介绍的一种。

Answer 9

It sounds like you might want just a bar graph:听起来您可能只想要一个条形图：

Alternatively, you can use subplots.或者，您可以使用子图。

Answer 10

Just in case you have pandas ( import pandas as pd ) or are ok with using it:以防万一您有熊猫（ import pandas as pd ）或者可以使用它：

test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)], 
                     [random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()

Answer 11

There is one caveat when you want to plot the histogram from a 2-d numpy array.当您想从二维 numpy 数组绘制直方图时，有一个警告。 You need to swap the 2 axes.您需要交换 2 个轴。

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()

Answer 12

This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.这个问题之前已经回答过，但想添加另一个快速/简单的解决方法，可能会帮助其他访问者解决这个问题。

import seasborn as sns 
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)

Some helpful examples are here for kde vs histogram comparison.这里有一些有用的例子，用于 kde 与直方图的比较。

Answer 13

Inspired by Solomon's answer, but to stick with the question, which is related to histogram, a clean solution is:受到所罗门回答的启发，但要坚持与直方图相关的问题，一个干净的解决方案是：

sns.distplot(bar)
sns.distplot(foo)
plt.show()

Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.确保首先绘制较高的直方图，否则您需要设置 plt.ylim(0,0.45) 以便不会切断较高的直方图。

使用 matplotlib 在单个图表上绘制两个直方图

问题描述

13 个解决方案

解决方案1
497 已采纳 2011-07-29 13:33:44

解决方案2
228 2016-09-14 02:41:04

解决方案3
35 2017-12-11 10:05:59

解决方案4
15 2018-12-10 01:48:00

解决方案5
11 2018-07-31 14:48:37

解决方案6
7 2017-07-05 11:56:36

解决方案7
3 2020-04-30 08:06:26

解决方案8
3 2020-12-25 20:13:27

解决方案9
3 2011-07-29 09:50:25

解决方案10
2 2017-06-16 12:35:46

解决方案11
2 2019-12-05 15:44:26

解决方案12
1 2019-04-30 18:07:04

解决方案13
1 2019-06-18 03:55:22

使用 matplotlib 在单个图表上绘制两个直方图

问题描述

13 个解决方案

解决方案1 497 已采纳 2011-07-29 13:33:44

解决方案2 228 2016-09-14 02:41:04

解决方案3 35 2017-12-11 10:05:59

解决方案4 15 2018-12-10 01:48:00

解决方案5 11 2018-07-31 14:48:37

解决方案6 7 2017-07-05 11:56:36

解决方案7 3 2020-04-30 08:06:26

解决方案8 3 2020-12-25 20:13:27

解决方案9 3 2011-07-29 09:50:25

解决方案10 2 2017-06-16 12:35:46

解决方案11 2 2019-12-05 15:44:26

解决方案12 1 2019-04-30 18:07:04

解决方案13 1 2019-06-18 03:55:22

解决方案1
497 已采纳 2011-07-29 13:33:44

解决方案2
228 2016-09-14 02:41:04

解决方案3
35 2017-12-11 10:05:59

解决方案4
15 2018-12-10 01:48:00

解决方案5
11 2018-07-31 14:48:37

解决方案6
7 2017-07-05 11:56:36

解决方案7
3 2020-04-30 08:06:26

解决方案8
3 2020-12-25 20:13:27

解决方案9
3 2011-07-29 09:50:25

解决方案10
2 2017-06-16 12:35:46

解决方案11
2 2019-12-05 15:44:26

解决方案12
1 2019-04-30 18:07:04

解决方案13
1 2019-06-18 03:55:22