简体   繁体   中英

Python Matplotlib - “weighted” boxplot

I'm trying to create a boxplot with a specified number that represents the number of times the value appears in the data.

What I Have:

import numpy as np
import matplotlib.pyplot as plt

data = np.array([[[0, 1, 2, 3], [31, 84, 2, 1]], [[0, 1, 2], [17, 104, 21]], [[0, 1, 2, 3, 4], [17, 106, 61, 3, 1]]])
plt.boxplot([data[0][0], data[1][0], data[2][0]])

Output:

在此处输入图像描述

What I want:

  • First Box: The data '0' to appear 31 times, '1' to appear 84 times etc (Same for all boxes)
  • Which would shift the quartile ranges, median line etc

I know I can do something like: (for each box)

merged_list_box1 = np.array([])
np.append(merged_list_box1, data[0][1][0]*31)
np.append(merged_list_box1, data[0][1][1]*84)
.
.
.

But due to the dataset I have, some merged_list for 1 box will have its length be over 500. And I have about 20 of such boxes. Is there a more efficient method?

Thanks in advance!

First off, the current version of numpy gives a deprecation warning, because a list of lists can only be converted to a numpy array if each of the sublists has the same number of elements. Converting such a list of lists to numpy format just keeps the list of list format.

Also note that np.append() is a slow operation, creating complete copies of the array at each step, and should be used sparingly. See eg How can I append to a numpy array without reassigning the result to a new variable? .

To repeat each element of a list the number of times given in a second list can be accomplished via np.repeat() . The generated numpy arrays having 500 elements isn't a problem. So, the code could look like:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

data = np.array([[[0, 1, 2, 3], [31, 84, 2, 1]], [[0, 1, 2], [17, 104, 21]], [[0, 1, 2, 3, 4], [17, 106, 61, 3, 1]]])
# or, better, just data = [[[0, 1, 2, 3], [31, 84, 2, 1]], [[0, 1, 2], [17, 104, 21]], [[0, 1, 2, 3, 4], [17, 106, 61, 3, 1]]]
plt.boxplot([np.repeat(d[0], d[1]) for d in data])
plt.gca().yaxis.set_major_locator(MaxNLocator(integer=True))
plt.show()

结果图

In this example, the second "box" looks like a line because the first and third quartile are both equal to 1 . As all the input values are integer, the example code forces the ticks to be integer.

The data could be simplified a bit assuming all first sublists are just sequences of numbers starting with zero.

data = [[31, 84, 2, 1], [17, 104, 21], [17, 106, 61, 3, 1]]
plt.boxplot([np.repeat(np.arange(len(d)), d) for d in data])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM