简体   繁体   中英

Plot different regression functions for different variables with Seaborn PairGrid, regplot

My problem is, how do I plot a regression in seaborn PairGrid which would depend on which variable is plotted and not if it is upper/lower/diagonal position? For example, I have the tips data set and I believe that the 'size' is correlated as a second-order polynomial regardless of the other variable, ie. the entire row/column in the pairgrid I want to have like that, but nothing else. However, what I only can do is to map this correlation to the upper/lower triangle to all plots , like this:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

smoke = sns.PairGrid(tips, vars=['total_bill', 'tip','size'])
smoke.map_upper(sns.regplot, color = 'k', order=2)
smoke.map_diag(sns.kdeplot)
smoke.map_lower(sns.regplot, color = 'b')

图。1

Is it possible with seaborn? And if I go even further, what if I want to check/plot an exponential correlation between eg. 'tip' and 'total_bill' just within the pairgrid, is that possible? How would I do that?

I know I can just take this specific case outside and plot it separately or use GridSpec but I wonder if there is an easier way. Thanks


EDIT (26.4.): The additional question is how to use hue in this setup. If I use simply:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
vars = ['total_bill', 'tip','size']

smoke = sns.PairGrid(tips, vars=vars, hue='smoker')
smoke.map_upper(plt.scatter)
smoke.map_diag(sns.kdeplot)
smoke.map_lower(plt.scatter)

# Add 2nd order polynomial regression to the 'size' column
for ax,y in zip(smoke.axes[:2,2],vars):
    sns.regplot(ax=ax, data=tips, x='size', y=y, order=2, scatter=False)
    ax.set_ylabel('')
    ax.set_xlabel('')

# Add logarithmic regression
sns.regplot(ax=smoke.axes[2,0], data=tips, x="total_bill", y='size', logx=True, scatter=False)

It does what I want, ie fit logarithmic regression, but very strangely. It puts the blue for the first row only, the orange for a second row only and then it creates green for first col, last row as shown in the following picture. So my question is how to fix it and why it occurs in the first place. Is it that hue creates new set of axes that are then needed to be iterated over?

图 2. -- 添加色调

PairGrid only lets you map the diagonal, the off-diagonal, and the upper and lower triangles. If you want more fine grain control over the plots, you can access the individual axes object using PairGrid.axes (2D array):

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
vars = ['total_bill', 'tip','size']

smoke = sns.PairGrid(tips, vars=vars)
smoke.map_upper(plt.scatter, color = 'k')
smoke.map_diag(sns.kdeplot)
smoke.map_lower(plt.scatter, color = 'b')

# Add 2nd order polynomial regression to the 'size' column
for ax,y in zip(smoke.axes[:2,2],vars):
    sns.regplot(ax=ax, data=tips, x='size', y=y, order=2, color='k', scatter=False)

# Add logarithmic regression
sns.regplot(ax=smoke.axes[2,0], data=tips, x="total_bill", y='size', logx=True, color='b', scatter=False)

在此处输入图像描述

EDIT: solution that works with hue-splitting

In this case, you have to do the regression on each subset of the data and plot on the same axes.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
vars = ['total_bill', 'tip','size']
hue_col = 'smoker'
hue_order=['Yes','No']

smoke = sns.PairGrid(tips, vars=vars, hue='smoker', hue_order=hue_order)
smoke.map_upper(plt.scatter)
smoke.map_diag(sns.kdeplot)
smoke.map_lower(plt.scatter)

# Add 2nd order polynomial regression to the 'size' column
for ax,y in zip(smoke.axes[:2,2],vars):
    for hue in hue_order:
        sns.regplot(ax=ax, data=tips.loc[tips[hue_col]==hue], x='size', y=y, order=2, scatter=False)
    ax.set_ylabel('')
    ax.set_xlabel('')

# Add logarithmic regression
for hue in hue_order:
    sns.regplot(ax=smoke.axes[2,0], data=tips.loc[tips[hue_col]==hue], x="total_bill", y='size', logx=True, scatter=False)

在此处输入图像描述

Yes, it's possible, because you can specify the x- and y-variables separately, eg

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")

smoke = sns.PairGrid(tips, x_vars=['total_bill', 'tip','size'], y_vars=['size'])
smoke.map(sns.regplot, color = 'k', order=2)
smoke.map_diag(sns.kdeplot)

配对图示例

To plot various kinds of regression functions, you would have to access each axes (subplot) individually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM