简体   繁体   中英

Applying and plotting data with multi-label classification

I want to create (eg) violinplots from pandas dataframes which can belong to multiple categories, ideally in a single figure. Not sure how to go about this however -- any suggestions? Many thanks!

A simple example showing separate plots. Here, x is main grouping variable, y are the data to be grouped and z defines membership/category. For simplicity, I've just set z to an integer to [0,1,2] randomly.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# dummy data
np.random.seed(12345)
x = np.random.randint(1,6,1000)
y = np.random.randn(1000)
z = np.random.randint(0,3,1000)
df = pd.DataFrame(data=np.array([x,y,z]).T,columns=['x','y','z'])

All data (for verification?):

sns.violinplot(x='x',y='y',data=df)
plt.title('all data')

Violin plot of all data regardless of Z

Individual plots:

fig,ax = plt.subplots(nrows=3,ncols=1,sharex=True)

sns.violinplot(x='x',y='y',data=df.loc[df['z']<=0],ax=ax[0])
ax[0].set_title('z <= 0')

sns.violinplot(x='x',y='y',data=df.loc[df['z']<=1],ax=ax[1])
ax[1].set_title('z <= 1')

sns.violinplot(x='x',y='y',data=df.loc[df['z']<=2],ax=ax[2])
ax[2].set_title('z <= 2')
plt.tight_layout();

3 violin plots of data with z<=[0,1,2] respectively

What I'd like is a plot that looks like the following, except that 'z' uses the grouping of the above plot:

plt.figure()
sns.violinplot(x='x',y='y',data=df,hue='z');

Violin plot using 'hue' where only data with z==[0,1,2] is grouped for each color

You can do this by creating a new dataframe containing the selections of z that you want to show by hue:

import numpy as np     # v 1.19.2
import pandas as pd    # v 1.1.3
import seaborn as sns  # v 0.11.0

# Create sample dataset
np.random.seed(12345)
x = np.random.randint(1,6,1000)
y = np.random.randn(1000)
z = np.random.randint(0,3,1000)
df = pd.DataFrame(data=np.array([x,y,z]).T,columns=['x','y','z'])

# Create new dataframe containing the selections of the 'z' variable
df0 = df.loc[df['z']<=0]
df1 = df.loc[df['z']<=1]
df2 = df.loc[df['z']<=2]
dfnew = pd.concat([df0, df1, df2], keys=['z <= 0', 'z <= 1', 'z <= 2'])
dfnew.reset_index(inplace=True)
dfnew.drop(columns='level_1', inplace=True)
dfnew.rename(columns={'level_0':'z selection'}, inplace=True)
dfnew.head()

#     z selection    x          y    z
#  0       z <= 0  3.0  -0.670121  0.0
#  1       z <= 0  3.0  -2.016201  0.0
#  2       z <= 0  2.0  -0.266742  0.0
#  3       z <= 0  2.0  -0.406730  0.0
#  4       z <= 0  2.0  -0.243281  0.0
ax = sns.violinplot(x='x', y='y', data=dfnew, hue='z selection')
ax.figure.set_size_inches(9, 6)

violinplot_zhue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM