[英]Plot by Categorical Group in Python Plotly
I have a pandas dataframe with only 5 variables.我有一个只有 5 个变量的 Pandas 数据框。 I want to create a scatter plot and color by a categorical variable.
我想通过分类变量创建散点图和颜色。 I'm using plotly so I can zoon in to specific regions.
我正在使用 plotly,所以我可以放大到特定区域。 Plotly doesn't allow me to pass a list of categorical variables as a color.
Plotly 不允许我将分类变量列表作为颜色传递。 Thank you in advance!
先感谢您! Here is my code:
这是我的代码:
import plotly.graph_objs as go
import plotly.plotly as py
import plotly.tools
plotly.tools.set_credentials_file(username='user', api_key='key')
trace1 = go.Scatter(
x = df['var1'],
y = df['var2'],
mode='markers',
marker=dict(
size=16,
color = df['categorialVar'], #set color equal to a variable
showscale=True
)
)
data = [trace1]
py.iplot(data, filename='scatter-plot-with-colorscale')
Had this problem recently and made a solution:最近遇到了这个问题并提出了解决方案:
def get_random_qualitative_color_map(
categorial_series: pd.Series,
colors: typing.List[str] = plotly_colors.qualitative.Alphabet
) -> typing.List[str]:
"""
Returns a color coding for a given series (one color for every unique value). Will repeat colors if not enough are
provided.
:param categorial_series: A series of categorial data
:param colors: color codes (everything plotly accepts)
:return: Array of colors matching the index of the objects
"""
# get unique identifiers
unique_series = categorial_series.unique()
# create lookup table - colors will be repeated if not enough
color_lookup_table = dict((value, color) for (value, color) in zip(unique_series, itertools.cycle(colors)))
# look up the colors in the table
return [color_lookup_table[key] for key in categorial_series]
unique_series = categorial_series.unique()
First we get the unique values in the series.首先,我们获得系列中的唯一值。 Everyone of them will be matched to a color.
他们每个人都会匹配一种颜色。
color_lookup_table = dict((value, color) for (value, color) in zip(unique_series, itertools.cycle(colors)))
Next we will create a dict (functions as a lookup table - we can look up which color belongs to which category element. The tricky part here is the use of itertools.cycle(colors)
. This function will return an iterator that will always cycle all the values in the given iterable (in this case a list of colors as defined by plot.ly).接下来我们将创建一个 dict(用作查找表 - 我们可以查找哪种颜色属于哪个类别元素。这里棘手的部分是使用
itertools.cycle(colors)
。该函数将返回一个始终循环的迭代器给定迭代中的所有值(在这种情况下是由 plot.ly 定义的颜色列表)。
Next we gonna zip
this iterator and the actual unique items.接下来我们将
zip
这个迭代器和实际的唯一项。 This creates pairs of (unique_item, color).这将创建成对 (unique_item, color)。 We get the nice effect of never running out of colors (because the cycle iterator will run endlessly).
我们得到了永远不会用完颜色的好效果(因为循环迭代器将无休止地运行)。 Meaning the returned dict will have
len(unique_series)
items.这意味着返回的 dict 将有
len(unique_series)
项。
[color_lookup_table[key] for key in categorial_series]
Lastly we look up each entry in the series in the lookup table using a list comprehension.最后,我们使用列表推导在查找表中查找系列中的每个条目。 This creates a list of colors for the data points.
这将创建数据点的颜色列表。 The list can then be used as an parameter for the
color
argument in the marker dict in any plotly.graphics_object
.然后,该列表可以用作任何
plotly.graphics_object
标记字典中的color
参数的参数。
So instead of continuing to look for a solution with plotly I stayed with the seaborn visualization library and added '%matplotlib notebook' which worked great and is easy.因此,我没有继续寻找 plotly 的解决方案,而是继续使用 seaborn 可视化库,并添加了“%matplotlib notebook”,它工作得很好而且很容易。
%matplotlib notebook
# Plot t-SNE
sns.set_context("notebook", font_scale=1.1)
sns.set_style("ticks")
sns.lmplot(x='var1',
y='var2',
data=tsne_out,
fit_reg=False,
legend=True,
size=9,
hue='categorialVar',
scatter_kws={"s":200, "alpha":0.3})
plt.title('Plot Title', weight='bold').set_fontsize('14')
plt.xlabel('Dimension 1', weight='bold').set_fontsize('10')
plt.ylabel('Dimension 2', weight='bold').set_fontsize('10')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.