使用seaborn barplot和分类数据的困难

Question

我一直在使用seaborn的“分类”绘图功能来实际绘制分类数据的速率时遇到一个经常性的问题。

我在这里提出了一个简单的例子，我本来可以宣誓曾经与seaborn合作。 我设法找到了使用伪变量的解决方法，但这并不总是很方便。 有谁知道为什么我的“版本2”用例无法使用？

import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Generate some example data of labels and associated values
outcomes = ['A' for _ in range(50)] + \
           ['B' for _ in range(20)] + \
           ['C' for _ in range(5)] 
trial = range(len(outcomes))

df = DataFrame({'Trial': trial, 'Outcome': outcomes})

plt.close('all')

# Version 1: This works but is a non-ideal workaround

# Generate separate boolean columns for each outcome
df2 = pd.get_dummies(df.Outcome).astype(bool)

plt.figure()
sns.barplot(data=df2, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V1')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()

# Version 2: This doesn't work and results in the following error
# unsupported operand type(s) for /: 'str' and 'int' 
plt.figure()
sns.barplot(x='Outcome', data=df, estimator=lambda x: 100 * np.mean(x))
plt.title('Outcomes V2')
plt.ylabel('Percent Trials')
plt.ylim([0,100])
plt.show()

这就是我期望的情节。

Answer 1

添加y参数适合您：

sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: 100 * np.mean(x))

但是，在您的情况下，使用sns.countplot进行绘图更有意义（因为您希望将试验10视为一种情况，而不是实际的十号）：

sns.countplot(x='Outcome', data=df)

其中，如果您想要百分比，则可以执行以下操作：

sns.barplot(x='Outcome', y='Trial', data=df, estimator=lambda x: len(x) / len(df) * 100)

说明

对于宽格式的数据框（例如df2 ），您只能将数据框传递给data参数，Seaborn会自动沿x轴绘制每个数字列。

对于长格式的数据帧（例如df ），您需要将参数同时传递给x和y参数。

从sns.barplot文档字符串（已添加em）：

输入数据可以多种格式传递，包括：

表示为列表，numpy数组或pandas Series对象的数据向量直接传递给x ， y和/或hue参数。

一个“长格式” DataFrame，在这种情况下， x ， y和hue变量将确定如何绘制数据。

一个“宽格式” DataFrame，这样将绘制每个数字列。

plt.boxplot接受的任何内容（例如二维数组或向量列表）

使用seaborn barplot和分类数据的困难

问题描述

1 个解决方案

解决方案1
0 2017-10-20 22:58:15

说明

使用seaborn barplot和分类数据的困难

问题描述

1 个解决方案

解决方案1 0 2017-10-20 22:58:15

说明

解决方案1
0 2017-10-20 22:58:15