简体   繁体   English

使用seaborn barplot和分类数据的困难

[英]Difficulties using seaborn barplot with categorical data

I've been encountering a recurrent problem with using seaborn's "categorical" plotting functions to actually plot rates of categorical data. 我一直在使用seaborn的“分类”绘图功能来实际绘制分类数据的速率时遇到一个经常性的问题。

I crafted a simple example here that I could have sworn used to work with seaborn. 我在这里提出了一个简单的例子,我本来可以宣誓曾经与seaborn合作。 I managed to find a workaround using dummy variables, but this isn't always convenient. 我设法找到了使用伪变量的解决方法,但这并不总是很方便。 Does anyone know why my "Version 2" use case for barplot doesn't work? 有谁知道为什么我的“版本2”用例无法使用?

import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Generate some example data of labels and associated values
outcomes = ['A' for _ in range(50)] + \
           ['B' for _ in range(20)] + \
           ['C' for _ in range(5)] 
trial = range(len(outcomes))

df = DataFrame({'Trial': trial, 'Outcome': outcomes})

plt.close('all')

# Version 1: This works but is a non-ideal workaround

# Generate separate boolean columns for each outcome
df2 = pd.get_dummies(df.Outcome).astype(bool)

plt.figure()
sns.barplot(data=df2, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V1')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()

# Version 2: This doesn't work and results in the following error
# unsupported operand type(s) for /: 'str' and 'int' 
plt.figure()
sns.barplot(x='Outcome', data=df, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V2')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()

这就是我期望的情节。

Adding the y parameter would work for you: 添加y参数适合您:

sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: 100 * np.mean(x))

However, in your case it makes more sense to plot with sns.countplot (since you want to treat trial 10 as one occurence, not the actual number ten): 但是,在您的情况下,使用sns.countplot进行绘图更有意义(因为您希望将试验10视为一种情况,而不是实际的十号):

sns.countplot(x='Outcome', data=df)

Of, if you want percentages, you could do something like: 其中,如果您想要百分比,则可以执行以下操作:

sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: len(x) / len(df) * 100)  

Explanation 说明

With a wide form data frame (such as df2 ), you can pass only the data frame to the data parameter, and Seaborn will automatically plot each numeric column along the x-axis. 对于宽格式的数据框(例如df2 ),您只能将数据框传递给data参数,Seaborn会自动沿x轴绘制每个数字列。

With a long-form data frame (such as df ), you need to pass arguments to both the x and y parameters. 对于长格式的数据帧(例如df ),您需要将参数同时传递给xy参数。

From the sns.barplot docstring (em added): sns.barplot文档字符串(已添加em):

Input data can be passed in a variety of formats, including: 输入数据可以多种格式传递,包括:

  • Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x , y , and/or hue parameters. 表示为列表,numpy数组或pandas Series对象的数据向量直接传递给xy和/或hue参数。
  • A "long-form" DataFrame, in which case the x , y , and hue variables will determine how the data are plotted. 一个“长格式” DataFrame,在这种情况下, xyhue变量将确定如何绘制数据。
  • A "wide-form" DataFrame, such that each numeric column will be plotted. 一个“宽格式” DataFrame,这样将绘制每个数字列。
  • Anything accepted by plt.boxplot (eg a 2d array or list of vectors) plt.boxplot接受的任何内容(例如二维数组或向量列表)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM