简体   繁体   中英

Multiple side-by-side histograms with matplotlib?

I have a piece of software that has to process lots of different data and can take a varying amount of time to process it. As the software gets revised, the time needed to process the data changes, and so I want to create a graph that shows the variance in time as well as outliers, because ideally, this program should take about the same amount of time for each piece of data (It sounds strange and unrealistic, I know, but just roll with me here).

At first, I thought about using box plots, but I thought they were inadequate because it is entirely possible to have half of a data set hovered around one value, with the other half hovered around another, and I didn't feel a box plot would illustrate that well. So I decided to try using a histogram, but I can't figure out how to get matplotlib to draw it the way I want it. I want a single figure, the X-axis being labeled with software versions, the Y-axis showing time taken to process a data set, with multiple histograms, like this mockup I made:

在此输入图像描述

This graph would show that in version 0.1, most data sets were processed in 2-4 seconds, with a bunch of sets for some reason taking 12 seconds. v0.1a got rid of those long outliers, but everything took longer. 0.1b is just slighty fast than 0.1a. Finally, 0.2 shows much speed improvement, but introduced outliers again.

How can I get matplotlib to create a plot like that?

Here is a (very) basic mockup of how this can be achieved:

import matplotlib.pyplot as plt
import numpy as np

number_of_bins = 20
number_of_data_points = 1000

ax = plt.subplot(111)

data_set = [np.random.normal(0, 1, number_of_data_points),
            np.random.normal(6, 1, number_of_data_points),
            np.random.normal(-3, 1, number_of_data_points)]

MID_VALUES = [0, 200, 400]
labels = ["v1", "v2", "v3"]


for MID_VAL, y in zip(MID_VALUES, data_set):

    hist, bin_edges = np.histogram(y, bins=number_of_bins)

    bottom = bin_edges[:-1]
    heights = np.diff(bin_edges)
    lefts = MID_VAL - .5 * hist

    ax.barh(bottom, hist, height=heights, left=lefts)

ax.set_xticks(MID_VALUES)
ax.set_xticklabels(labels)

plt.show()

在此输入图像描述

This lacks a lot of refinement I admit, for example: the MID_VALUES are chosen by hand,this will depend on the data set and could be automated. Nevertheless, you may be able to get it into a more usable format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM