简体   繁体   中英

Is this an error in the seaborn.lineplot hue parameter?

With this code snippet, I'm expecting a line plot with one line per hue, which has these distinct values: [1, 5, 10, 20, 40].

import math
import pandas as pd
import seaborn as sns

sns.set(style="whitegrid")

TANH_SCALING = [1, 5, 10, 20, 40]
X_VALUES = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
COLUMNS = ['x', 'y', 'hue group']

tanh_df = pd.DataFrame(columns=COLUMNS)

for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[sc]}
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df);

However, what I get is a hue legend with values [0, 15, 30, 45], and an additional line, like so:

在此处输入图像描述

Is this a bug or am I missing something obvious?

This is a known bug of seaborn when the hue can be cast to integers. You could add a prefix to the hue so casting to integers fails:

for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[f'A{sc}']}             # changes here
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )

Output:

在此处输入图像描述

Or after you created your data:

# data creation
for sc in TANH_SCALING:
    data = {
        COLUMNS[0]: X_VALUES,
        COLUMNS[1]: [math.tanh(x/sc) for x in X_VALUES],
        COLUMNS[2]: len(X_VALUES)*[f'A{sc}']}
    tanh_df = tanh_df.append(
        pd.DataFrame(data=data, columns=COLUMNS),
        ignore_index=True
    )


# hue manipulation
sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], 
             hue='A_' + tanh_df[COLUMNS[2]].astype(str), # change hue here
             data=tanh_df);

As @LudvigH's comment on the other answer says, this isn't a bug, even if the default behavior is surprising in this case. As explained in the docs :

The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below.

Here are two specific ways to control the behavior.

If you want to keep the numeric color mapping but have the legend show the exact values in your data, set legend="full" :

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, legend="full")

在此处输入图像描述

If you want to have seaborn treat the levels of the hue parameter as discrete categorical values, pass a named categorical colormap or either a list or dictionary of the specific colors you want to use:

sns.lineplot(x=COLUMNS[0], y=COLUMNS[1], hue=COLUMNS[2], data=tanh_df, palette="deep")

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM