简体   繁体   中英

How to plot side by side boxplots with grouped data from different columns

I would like to visualize my data as side by side boxplots over two categorical columns corresponding to two numerical columns.

Here is my try

# create a list of our conditions
conditions_BPXSY1 = [
    (da['BPXSY1'] < 125),
    (da['BPXSY1'] >= 125) & (da['BPXSY1'] <= 174),
    (da['BPXSY1'] > 175)
    ]
conditions_BPXSY2 = [
    (da['BPXSY2'] < 125),
    (da['BPXSY2'] >= 125) & (da['BPXSY2'] <= 174),
    (da['BPXSY2'] > 175)
    ]

# create a list of the values we want to assign for each condition
values = ['< 125 mm Hg', '125 – 174 mm Hg', '175+ mm Hg']

# create a new column and use np.select to assign values to it using our lists as arguments
#da.dropna(inplace=True)
da['BPXSY1x'] = np.select(conditions_BPXSY1, values)
da['BPXSY2x'] = np.select(conditions_BPXSY2, values)

f, axes = plt.subplots(1, 2, figsize=(13, 6))
sns.boxplot(x="BPXSY1x", y="BPXSY1", data=da, order=['< 125 mm Hg', '125 – 174 mm Hg', '175+ mm Hg'],  orient='v' , ax=axes[0])
sns.boxplot(x="BPXSY2x", y="BPXSY2", data=da, order=['< 125 mm Hg', '125 – 174 mm Hg', '175+ mm Hg'],  orient='v' , ax=axes[1])

Here is the result:

在此处输入图片说明

But, I would like to have a result look like where J is BPXSY1 and R is BPXSY2 (of course, I don't have S)

在此处输入图片说明

  • It looks like there is a single dataframe with two columns 'BPXSY1' and 'BPXSY2' .
  • Data visualization is about reshaping the dataframe, to send to the plot API.
  • Instead of dealing with the columns separately, they must be stacked, with a label for the study, as one column, and the blood pressures, in another column.
  • Use pandas.cut to bin and label the blood pressure values.
import pandas as pd
import seaborn as sns

# given dataframe df
   bpxsy1  bpxsy2
0      74    70.0
1      74    72.0
2      78    76.0

# stack the data columns
df = df.stack().reset_index(level=1).rename(columns={'level_1': 'stdy', 0: 'bp'}).reset_index(drop=True)

# display(df)
     stdy    bp
0  bpxsy1  74.0
1  bpxsy2  70.0
2  bpxsy1  74.0

# bin the measurement values
bins = [0, 125, 150, 175]
labels = ['< 125 mm Hg', '125 - 150 mm Hg', '150+ mm Hg']
df['bins'] = pd.cut(df.bp, bins=bins, labels=labels, right=False)

# display(df)
     stdy    bp         bins
0  bpxsy1  74.0  < 125 mm Hg
1  bpxsy2  70.0  < 125 mm Hg
2  bpxsy1  74.0  < 125 mm Hg

# plot
plt.figure(figsize=(9, 7))
sns.boxplot(x='bins', y='bp', hue='stdy', data=df)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM