简体   繁体   English

Seaborn/Matplotlib 分类图标记大小按观察计数

[英]Seaborn/Matplotlib categorical plot markers size by count of observations

I want to scale markers on a plot of 2 categorical variables by count of observations.我想通过观察计数来缩放 2 个分类变量图上的标记。

I am using seaborn.pairplot for easiness, because I have quite a lot of variables (features).我使用seaborn.pairplot为了方便,因为我有很多变量(功能)。 But I don't think there is an argument for a case like this.但我认为这样的案例没有任何论据。

I am guessing that what you are looking for is aballoon plot , also known as a matrix bubble chart or a categorical bubble plot .我猜您正在寻找的是气球图,也称为矩阵气泡图或分类气泡图 To my knowledge, seaborn does not provide this type of plot as of version 0.11.0 so using pairplot is currently not an option.据我所知,从 0.11.0 版本开始,seaborn 不提供这种类型的绘图,因此目前不能选择使用 pairplot。 I know of two functions that provide this type of plot displaying a single categorical-to-categorical relationship with a selected numerical variable for the size of the markers: this one in the pygal package and catscatter .我知道有两个函数提供这种类型的图,显示单个分类到分类关系与标记大小的选定数值变量:这个在 pygal 包和catscatter 中 But the downside is that both of these require that you have the count of observations as a column in your dataset, which I assume is not your case.但缺点是这两者都要求您将观察计数作为数据集中的一列,我认为这不是您的情况。

Here is a way to create a balloon plot displaying the count of observations grouped by two categorical variables contained in a pandas dataframe:这是一种创建气球图的方法,该气球图显示按熊猫数据框中包含的两个分类变量分组的观察计数:

import pandas as pd                # v 1.1.3
import matplotlib.pyplot as plt    # v 3.3.2
import seaborn as sns              # v 0.11.0

# Import seaborn sample dataset stored as a pandas dataframe and select
# the categorical variables to plot
df = sns.load_dataset('titanic')
x = 'who'  # contains 3 unique values: 'child', 'man', 'woman'
y = 'embark_town'  # contains 3 unique values: 'Southampton', 'Queenstown', 'Cherbourg'

# Compute the counts of observations
df_counts = df.groupby([x, y]).size().reset_index()
df_counts.columns.values[df_counts.columns == 0] = 'count'

# Compute a size variable for the markers so that they have a good size regardless
# of the total count and the number of unique values in each categorical variable
scale = 500*df_counts['count'].size
size = df_counts['count']/df_counts['count'].sum()*scale

# Create matplotlib scatter plot with additional formatting
fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(x, y, size, data=df_counts, zorder=2)
ax.grid(color='grey', linestyle='--', alpha=0.4, zorder=1)
ax.tick_params(length=0)
ax.set_frame_on(False)
ax.margins(.3)

气球图

Sources of inspiration: catscatter , this answer灵感来源: catscatter这个答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM