简体   繁体   中英

Python maxplotlib - boxsplot subplot + scatter plot

I am trying to perform a scatter plot within a boxplot as subplot. When I do for just one boxsplot, it works. I can define a specific point with specific color inside of the boxsplot. The green ball (Image 1) is representing an specific number in comparision with boxplot values.

  for columnName in data_num.columns:
    plt.figure(figsize=(2, 2), dpi=100)
    bp = data_num.boxplot(column=columnName, grid=False)
    y = S[columnName]
    x = columnName
    if y > data_num[columnName].describe().iloc[5]:
      plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
      count_G = count_G + 1
    elif y < data_num[columnName].describe().iloc[5]:
      plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
      count_L = count_L + 1
    else:
      plt.plot(1, y, 'r.', alpha=0.7,color='yellow',markersize=12)
      count_E = count_E + 1

Image 1 - Scatter + 1 boxplot

I can create a subplot with boxplots.

  fig, axes = plt.subplots(6,10,figsize=(16,16)) # create figure and axes
  fig.subplots_adjust(hspace=0.6, wspace=1)

  for j,columnName in enumerate(list(data_num.columns.values)[:-1]):
    bp = data_num.boxplot(columnName,ax=axes.flatten()[j])

Image 2 - Subplots + Boxplots
But when I try to plot a specific number inside of each boxplot, actually it subscribes the entire plot.

plt.subplot(6,10,j+1)  
if y > data_num[columnName].describe().iloc[5]:
  plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
  count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
  plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
  count_L = count_L + 1
else:
  plt.plot(1, y, 'r.', alpha=0.7,color='black',markersize=12)
  count_E = count_E + 1

Image 3 - Subplots + scatter

It is not completely clear what is going wrong. Probably the call to plt.subplot(6,10,j+1) is erasing some stuff. However, such a call is not necessary with the standard modern use of matplotlib, where the subplots are created via fig, axes = plt.subplots() . Be careful to use ax.plot() instead of plt.plot() . plt.plot() plots on the "current" ax, which can be a bit confusing when there are lots of subplots.

The sample code below first creates some toy data (hopefully similar to the data in the question). Then the boxplots and the individual dots are drawn in a loop. To avoid repetition, the counts and the colors are stored in dictionaries. As data_num[columnName].describe().iloc[5] seems to be the median, for readability the code directly calculates that median.

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

column_names = list('abcdef')
S = {c: np.random.randint(2, 6) for c in column_names}
data_num = pd.DataFrame({c: np.random.randint(np.random.randint(0, 3), np.random.randint(4, 8), 20)
                         for c in column_names})
colors = {'G': 'limegreen', 'E': 'gold', 'L': 'crimson'}
counts = {c: 0 for c in colors}

fig, axes = plt.subplots(1, 6, figsize=(12, 3), gridspec_kw={'hspace': 0.6, 'wspace': 1})
for columnName, ax in zip(data_num.columns, axes.flatten()):
    data_num.boxplot(column=columnName, grid=False, ax=ax)
    y = S[columnName]  # in case S would be a dataframe with one row: y = S[columnName].values[0]
    data_median = data_num[columnName].median()
    classification = 'G' if y > data_median else 'L' if y < data_median else 'E'
    ax.plot(1, y, '.', alpha=0.9, color=colors[classification], markersize=12)
    counts[classification] += 1
print(counts)
plt.show()

示例图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM