[英]Plotting Multiple Histograms in Matplotlib - Colors or side-by-side bars
Problem : When Plotting Multiple Histograms in Matplotlib, i cannot differentiate a plot from another 问题:在Matplotlib中绘制多个直方图时,我无法区分绘图与另一个绘图
Problem as Image : ** 图像问题:** **Minor Problem : The left label 'Count' is out of the image, partially. **次要问题:部分左侧标签“计数”不在图像范围内。 Why? 为什么?
Description 描述
I want to plot the histogram of the 3 different sets. 我想绘制3个不同组的直方图。 Each set, is an array with 0's and 1's. 每组都是一个0和1的数组。 I want the histogram of each so i can detect imbalances on the dataset. 我想要每个的直方图,所以我可以检测数据集上的不平衡。
I have them plotted separately but i wanted a graphic of them together. 我让它们分开绘制,但我想要一起绘制它们的图形。
It would be okay to have a different graphic with bars side-by-side or, i even googled about plotting it as 3D, but i dont know how easy would be to "read" or "look" at the graphic and understand it. 可以并排显示不同的图形,或者我甚至用谷歌搜索将其绘制为3D,但我不知道在图形上“阅读”或“查看”并理解它是多么容易。
Right now, i want to plot the [train],[validation] and [test] bars at each side on the same graphic, something like this : 现在,我想在同一图形的每一侧绘制[train],[validation]和[test]条形图,如下所示:
PS : My googling didnt return any code that was understandable to me. PS:我的谷歌搜索没有返回任何可以理解的代码。 Also, i would like if someone would check if im doing any insanity on my code. 此外,我想如果有人会检查我是否对我的代码做了任何疯狂 。
Thanks a lot guys! 非常感谢!
Code : 代码:
def generate_histogram_from_array_of_labels(Y=[], labels=[], xLabel="Class/Label", yLabel="Count", title="Histogram of Trainset"):
plt.figure()
plt.clf()
colors = ["b", "r", "m", "w", "k", "g", "c", "y"]
information = []
for index in xrange(0, len(Y)):
y = Y[index]
if index > len(colors):
color = colors[0]
else:
color = colors[index]
if labels is None:
label = "?"
else:
if index < len(labels):
label = labels[index]
else:
label = "?"
unique, counts = np.unique(y, return_counts=True)
unique_count = np.empty(shape=(unique.shape[0], 2), dtype=np.uint32)
for x in xrange(0, unique.shape[0]):
unique_count[x, 0] = unique[x]
unique_count[x, 1] = counts[x]
information.append(unique_count)
# the histogram of the data
n, bins, patches = plt.hist(y, unique.shape[0], normed=False, facecolor=color, alpha=0.75, range=[np.min(unique), np.max(unique) + 1], label=label)
xticks_pos = [0.5 * patch.get_width() + patch.get_xy()[0] for patch in patches]
plt.xticks(xticks_pos, unique)
plt.xlabel(xLabel)
plt.ylabel(yLabel)
plt.title(title)
plt.grid(True)
plt.legend()
# plt.show()
string_of_graphic_image = cStringIO.StringIO()
plt.savefig(string_of_graphic_image, format='png')
string_of_graphic_image.seek(0)
return base64.b64encode(string_of_graphic_image.read()), information
Edit 编辑
Following the answer of hashcode, this new code : 在哈希码的答案之后,这个新代码:
def generate_histogram_from_array_of_labels(Y=[], labels=[], xLabel="Class/Label", yLabel="Count", title="Histogram of Trainset"):
plt.figure()
plt.clf()
colors = ["b", "r", "m", "w", "k", "g", "c", "y"]
to_use_colors = []
information = []
for index in xrange(0, len(Y)):
y = Y[index]
if index > len(colors):
to_use_colors.append(colors[0])
else:
to_use_colors.append(colors[index])
unique, counts = np.unique(y, return_counts=True)
unique_count = np.empty(shape=(unique.shape[0], 2), dtype=np.uint32)
for x in xrange(0, unique.shape[0]):
unique_count[x, 0] = unique[x]
unique_count[x, 1] = counts[x]
information.append(unique_count)
unique, counts = np.unique(Y[0], return_counts=True)
histrange = [np.min(unique), np.max(unique) + 1]
# the histogram of the data
n, bins, patches = plt.hist(Y, 1000, normed=False, alpha=0.75, range=histrange, label=labels)
#xticks_pos = [0.5 * patch.get_width() + patch.get_xy()[0] for patch in patches]
#plt.xticks(xticks_pos, unique)
plt.xlabel(xLabel)
plt.ylabel(yLabel)
plt.title(title)
plt.grid(True)
plt.legend()
Is producing this : 产生这个:
-- New Edit : - 新编辑:
def generate_histogram_from_array_of_labels(Y=[], labels=[], xLabel="Class/Label", yLabel="Count", title="Histogram of Trainset"):
plt.figure()
plt.clf()
information = []
for index in xrange(0, len(Y)):
y = Y[index]
unique, counts = np.unique(y, return_counts=True)
unique_count = np.empty(shape=(unique.shape[0], 2), dtype=np.uint32)
for x in xrange(0, unique.shape[0]):
unique_count[x, 0] = unique[x]
unique_count[x, 1] = counts[x]
information.append(unique_count)
n, bins, patches = plt.hist(Y, normed=False, alpha=0.75, label=labels)
plt.xticks((0.25, 0.75), (0, 1))
plt.xlabel(xLabel)
plt.ylabel(yLabel)
plt.title(title)
plt.grid(True)
plt.legend()
Is working now but, the label from the left side is kinda out of bounds and i wanted to center the bars better... How can i do that? 现在正在工作,但是,左侧的标签有点出界,我想更好地使酒吧居中......我怎么能这样做?
I tried and came up with this. 我试过了,想出了这个。 You can change the xticks position in the code. 您可以在代码中更改xticks位置。 Simply what you have to do is pass on a tuple to the plt.hist
, can't be more simple right !? 简单地说,你要做的就是将一个元组传递给plt.hist
,不能更简单吧! So lets suppose you have two lists of 0s and 1s, so what you gotta do is - 所以假设你有两个0和1的列表,所以你要做的是 -
a = np.random.randint(2, size=1000)
b = np.random.randint(2, size=1000)
plt.hist((a, b), 2, label = ("data1", "data2"))
plt.legend()
plt.xticks((0.25, 0.75), (0, 1))
The exact code I tried to run (after changing the number of bins to 2)- 我试图运行的确切代码(在将箱数改为2之后) -
a = np.random.randint(2, size=1000)
b = np.random.randint(2, size=1000)
y = [a, b]
labels = ["data1", "data2"]
generate_histogram_from_array_of_labels(Y = y, labels = labels)
Aand I got the same result... 我得到了同样的结果......
If your datasets are of equal length, you might be able to do this easily with pandas. 如果您的数据集长度相等,您可以使用pandas轻松完成此操作。 So assuming you have 所以假设你有
import numpy
N = 1000
train, validation, test = [numpy.random.randint(2, size=N) for _ in range(3)]
Y = [train, validation, test]
You can simply do 你可以干脆做
import pandas
df = pandas.DataFrame(list(zip(*Y)), columns=['Train', 'Validation', 'Test'])
df.apply(pandas.value_counts).plot.bar()
which results in this plot: 这导致了这个情节:
If you also import seaborn
, it looks a bit nicer: 如果你也import seaborn
,它看起来更好一点:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.