Python 中的分類氣泡 plot

Question

我有一個包含很多分類變量和二進制目標變量的數據集。 package 在 Python 或其他基於開源 GUI 的軟件中可用，我可以在 X 和 Y 軸上散點圖兩個分類變量並將目標變量用作色調？

我看過 Seaborn 的貓圖，但為此，一個軸必須是數字的，而另一個軸必須是分類的。 所以它不適用於這種情況。

例如，您可以使用以下內容：

import seaborn as sns
data = sns.load_dataset('titanic')

這是我想要的 plot 功能

X 軸 - 'embark_town'
Y 軸 - '類'
色調 - '活着'

Answer 1

就像 matplotlib 一樣，Seaborn 支持分類變量與分類變量的繪圖。 可以創建允許查看兩個類別的半透明標記，盡管如果兩者的大小相似，這可能難以與一個標記區分開來。 基本的 plot 相當簡單 - 我們使用 groupby 和 size 轉換 dataframe 以計算每個三胞胎出發城鎮的條目 - class - 使用標記作為活動類別，然后創建一個散點圖。 但是，圖例條目是這里復雜的部分。 plot 中的標記尺寸很小，或者傳說中的標記尺寸很大。 我試圖平衡這一點，但我對結果不滿意。 這里需要進行大量手動調整，因此 seaborn 在這里沒有真正的優勢。 歡迎任何有關如何在 seaborn 中簡化此操作的建議。

import seaborn as sns
import matplotlib.pyplot as plt

#dataframe and categories 
df = sns.load_dataset('titanic')
X = "embark_town"
Y = "class"
H = "alive"

#counting the X-Y-H category entries
plt_df = df.groupby([X, Y, H]).size().to_frame(name="people").reset_index()

#figure preparation with grid and scaling
fig, ax = plt.subplots(figsize=(6,4))
ax.set_ylim(plt_df[Y].unique().size-0.5, -0.5)
ax.set_xlim(-0.5, plt_df[X].unique().size+1.0)
ax.grid(ls="--")

#the actual scatterplot with markersize representing the counted values
sns.scatterplot(x=X,
                y=Y,
                size="people",
                sizes=(100, 10000),
                alpha=0.5,
                edgecolor="black",
                hue=H,
                data=plt_df,
                ax=ax)

#creating two legends because the hue markers differ in size from the others
handles, labels = ax.get_legend_handles_labels()
l = ax.legend(handles[:3], labels[:3], title="The poor die first", markerscale=2, loc="upper right")
ax.add_artist(l)
#and seaborn plots the size markers in black, so you would get massive black blobs in the legend
#we change the color and make them transparent
for handle in handles:
    handle.set_facecolors((0, 1, 1, 0.5))
ax.legend(handles[4::2], labels[4::2], title="N° of people", loc="lower right", handletextpad=4, labelspacing=3, markerfirst=False)
plt.tight_layout()
plt.show()

樣品 output：

Answer 2

我的觀點是，如果您必須大量重新排列 seaborn 圖表，您也可以使用 matplotlib 從頭開始創建此圖表。 這使我們有機會采用不同的方法來顯示此分類與分類 plot：

import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
import numpy as np

#dataframe and categories 
import seaborn as sns
df = sns.load_dataset('titanic')

X = "embark_town"
Y = "class"
H = "alive"
bin_dic = {0: "yes", 1: "no"}

#counting the X-Y-H category entries
plt_df = df.groupby([X, Y, H]).size().to_frame(name="vals").reset_index()

#figure preparation with grid and scaling
fig, ax = plt.subplots(figsize=(9, 6))
ax.set_ylim(plt_df[Y].unique().size-0.5, -0.5)
ax.set_xlim(-0.5, plt_df[X].unique().size+1.0)
ax.grid(ls="--")

#upscale factor for scatter marker size
scale=10000/plt_df.vals.max()
#left marker for category 0
ax.scatter(plt_df[plt_df[H]==bin_dic[0]][X], 
           plt_df[plt_df[H]==bin_dic[0]][Y], 
           s=plt_df[plt_df[H]==bin_dic[0]].vals*scale, 
           c=[(0, 0, 1, 0.5)], edgecolor="black", marker=MarkerStyle("o", fillstyle="left"), 
           label=bin_dic[0])
#right marker for category 1
ax.scatter(plt_df[plt_df[H]==bin_dic[1]][X], 
           plt_df[plt_df[H]==bin_dic[1]][Y], 
           s=plt_df[plt_df[H]==bin_dic[1]].vals*scale, 
           c=[(1, 0, 0, 0.5)], edgecolor="black", marker=MarkerStyle("o", fillstyle="right"), 
           label=bin_dic[1])

#legend entries for the two categories
l = ax.legend(title="Survived the catastrophe", ncol=2, framealpha=0, loc="upper right", columnspacing=0.1,labelspacing=1.5) 
l.legendHandles[0]._sizes = l.legendHandles[1]._sizes = [800]

#legend entries representing sizes
bubbles_n=5
bubbles_min = 50*(1+plt_df.vals.min()//50)
bubbles_step = 10*((plt_df.vals.max()-bubbles_min)//(10*(bubbles_n-1)))
bubbles_x = plt_df[X].unique().size+0.5

for i, bubbles_y in enumerate(np.linspace(0.5, plt_df[Y].unique().size-1, bubbles_n)): 
    #plot each legend bubble to indicate different marker sizes
    ax.scatter(bubbles_x, 
               bubbles_y,
               s=(bubbles_min + i*bubbles_step) * scale,
               c=[(1, 0, 1, 0.6)], edgecolor="black")
    #and label it with a value
    ax.annotate(bubbles_min+i*bubbles_step, xy=(bubbles_x, bubbles_y), 
                ha="center", va="center",
                fontsize="large", fontweight="bold", color="white")

plt.show()

Python 中的分類氣泡 plot

問題描述

2 個解決方案

解決方案1
1 2021-02-21 09:27:17

解決方案2
1 2021-02-21 20:05:17

Python 中的分類氣泡 plot

問題描述

2 個解決方案

解決方案1 1 2021-02-21 09:27:17

解決方案2 1 2021-02-21 20:05:17

解決方案1
1 2021-02-21 09:27:17

解決方案2
1 2021-02-21 20:05:17