简体   繁体   中英

Plot by Categorical Group in Python Plotly

I have a pandas dataframe with only 5 variables. I want to create a scatter plot and color by a categorical variable. I'm using plotly so I can zoon in to specific regions. Plotly doesn't allow me to pass a list of categorical variables as a color. Thank you in advance! Here is my code:

import plotly.graph_objs as go
import plotly.plotly as py
import plotly.tools

plotly.tools.set_credentials_file(username='user', api_key='key')

trace1 = go.Scatter(
    x = df['var1'],
    y = df['var2'],
    mode='markers',
    marker=dict(
        size=16,
        color = df['categorialVar'], #set color equal to a variable
        showscale=True
    )
)
data = [trace1]

py.iplot(data, filename='scatter-plot-with-colorscale')

Had this problem recently and made a solution:

def get_random_qualitative_color_map(
        categorial_series: pd.Series,
        colors: typing.List[str] = plotly_colors.qualitative.Alphabet
) -> typing.List[str]:
    """
    Returns a color coding for a given series (one color for every unique value). Will repeat colors if not enough are
    provided.
    :param categorial_series: A series of categorial data
    :param colors: color codes (everything plotly accepts)
    :return: Array of colors matching the index of the objects
    """
    # get unique identifiers
    unique_series = categorial_series.unique()

    # create lookup table - colors will be repeated if not enough
    color_lookup_table = dict((value, color) for (value, color) in zip(unique_series, itertools.cycle(colors)))

    # look up the colors in the table
    return [color_lookup_table[key] for key in categorial_series]
  • The solution repeats colors if the color array is empty
  • Can be used with any color palette (in this case plot.ly Alphabet is the default)

Explanation

unique_series = categorial_series.unique()

First we get the unique values in the series. Everyone of them will be matched to a color.

color_lookup_table = dict((value, color) for (value, color) in zip(unique_series, itertools.cycle(colors)))

Next we will create a dict (functions as a lookup table - we can look up which color belongs to which category element. The tricky part here is the use of itertools.cycle(colors) . This function will return an iterator that will always cycle all the values in the given iterable (in this case a list of colors as defined by plot.ly).

Next we gonna zip this iterator and the actual unique items. This creates pairs of (unique_item, color). We get the nice effect of never running out of colors (because the cycle iterator will run endlessly). Meaning the returned dict will have len(unique_series) items.

[color_lookup_table[key] for key in categorial_series]

Lastly we look up each entry in the series in the lookup table using a list comprehension. This creates a list of colors for the data points. The list can then be used as an parameter for the color argument in the marker dict in any plotly.graphics_object .

So instead of continuing to look for a solution with plotly I stayed with the seaborn visualization library and added '%matplotlib notebook' which worked great and is easy.

%matplotlib notebook

# Plot t-SNE
sns.set_context("notebook", font_scale=1.1)
sns.set_style("ticks")

sns.lmplot(x='var1',
       y='var2',
       data=tsne_out,
       fit_reg=False,
       legend=True,
       size=9,
       hue='categorialVar',
       scatter_kws={"s":200, "alpha":0.3})

plt.title('Plot Title', weight='bold').set_fontsize('14')
plt.xlabel('Dimension 1', weight='bold').set_fontsize('10')
plt.ylabel('Dimension 2', weight='bold').set_fontsize('10')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM