Matplotlib：如何绘制具有不同颜色和注释的聚类？

Question

The Matplotlib is highly confusing to me. Matplotlib使我感到非常困惑。 I have a pd.DataFrame with columns x , y an cluster . 我有一个带有列x ， y的pd.DataFrame cluster 。 I wish to plot this data on an xy plot, where every cluster has a different color and an annotation of which cluster that is. 我希望将此数据绘制在xy图上，其中每个群集都有不同的颜色，并注明了哪个群集。

I'm capable of doing these separately. 我能够分别进行这些操作。 To plot the data with different colors: 要用不同的颜色绘制数据：

for c in np.unique(data['cluster'].tolist()):
    df = data[data['c'].isin([c])]
    plt.plot(df['x'].tolist(),df['y'].tolist(),'o')
plt.show()

This yields: 这样产生：

And annotations: 和注释：

fig, ax = plt.subplots()
x = df['x'].tolist()
y = df['y'].tolist()
ax.scatter(x, y)
for i, txt in enumerate(data['cluster'].tolist()):
    ax.annotate(txt, (x[i],y[i]))
plt.show()

This yields: 这样产生：

How do I combine the two? 我如何结合两者？ I don't understand how to mix the figure / axes / plot APIs all together.. 我不明白如何将figure / axes / plot API混合在一起。

Sample data: 样本数据：

pd.DataFrame({'c': ['News',   'Hobbies & Interests',   'Arts & Entertainment',   'Internal Use',   'Business',   'Internal Use',   'Internal Use',   'Ad Impression Fraud',   'Arts & Entertainment',   'Adult Content',   'Arts & Entertainment',   'Internal Use',   'Internal Use',   'Reference',   'News',   'Shopping',   'Food & Drink',   'Internal Use',   'Internal Use',   'Reference'],  
'x': [-95.44078826904297,   127.71454620361328,   -491.93121337890625,   184.5579071044922,   -191.46273803710938,   95.22545623779297,   272.2229919433594,   -67.099365234375,   -317.60797119140625,   -175.90196228027344,   -491.93121337890625,   214.3858642578125,   184.5579071044922,   346.4012756347656,   -151.8809051513672,   431.6130676269531,   -299.4017028808594,   184.5579071044922,   184.5579071044922,   241.29026794433594],  
'y': [-40.87070846557617,   245.00514221191406,   43.07831954956055,   -458.2991638183594,   270.4497985839844,   -453.2981262207031,   -439.6551513671875,   -206.3104248046875,   205.25787353515625,   -58.520164489746094,   43.07831954956055,   -182.91664123535156,   -458.2991638183594,   19.559282302856445,   -281.3316650390625,   103.6922378540039,   280.2445373535156,   -458.2991638183594,   -458.2991638183594,   -113.96920776367188]})

Answer 1

I'll use df.plot.scatter syntax for comfortable reasons, but should be (nearly) the same as ax.scatter. 出于舒适的原因，我将使用df.plot.scatter语法，但应该（几乎）与ax.scatter相同。

Okay, so using your example data, you can specify a cmap like described in the docs : 好的，因此，使用示例数据，您可以指定docs中所述的cmap ：

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'c': ['News',   'Hobbies & Interests',   'Arts & Entertainment',   'Internal Use',   'Business',   'Internal Use',   'Internal Use',   'Ad Impression Fraud',   'Arts & Entertainment',   'Adult Content',   'Arts & Entertainment',   'Internal Use',   'Internal Use',   'Reference',   'News',   'Shopping',   'Food & Drink',   'Internal Use',   'Internal Use',   'Reference'],  
'x': [-95.44078826904297,   127.71454620361328,   -491.93121337890625,   184.5579071044922,   -191.46273803710938,   95.22545623779297,   272.2229919433594,   -67.099365234375,   -317.60797119140625,   -175.90196228027344,   -491.93121337890625,   214.3858642578125,   184.5579071044922,   346.4012756347656,   -151.8809051513672,   431.6130676269531,   -299.4017028808594,   184.5579071044922,   184.5579071044922,   241.29026794433594],  
'y': [-40.87070846557617,   245.00514221191406,   43.07831954956055,   -458.2991638183594,   270.4497985839844,   -453.2981262207031,   -439.6551513671875,   -206.3104248046875,   205.25787353515625,   -58.520164489746094,   43.07831954956055,   -182.91664123535156,   -458.2991638183594,   19.559282302856445,   -281.3316650390625,   103.6922378540039,   280.2445373535156,   -458.2991638183594,   -458.2991638183594,   -113.96920776367188]})

df['col'] = df.c.astype('category').cat.codes

cmap = plt.cm.get_cmap('jet', df.c.nunique())
ax = df.plot.scatter(
    x='x',y='y', c='col',
    cmap=cmap
)
plt.show()

Here get_cmap takes a cmap name (You can find the names of various maps on this example page ) and 在这里， get_cmap一个cmap名称（您可以在此示例页面上找到各种地图的名称）和

an integer giving the number of entries desired in the lookup table, 一个整数，给出查找表中所需的条目数，

The above code results in the following: 上面的代码导致以下结果：

If you want to add your annotations and suppress the colorbar, use: 如果要添加注释并取消颜色栏，请使用：

ax = df.plot.scatter(
    x='x',y='y', c='col',
    cmap=cmap, colorbar=False
)
for i, txt in enumerate(df['c'].tolist()):
    ax.annotate(txt, (df.x[i], df.y[i]))
plt.show()

And get the following: 并获得以下信息：

Hint: Use the "s" param in plt.scatter(x,y,s=None, c=None, **kwds) to change the size if this is too small. 提示：如果plt.scatter(x,y,s=None, c=None, **kwds)请使用plt.scatter(x,y,s=None, c=None, **kwds)的“ s”参数来更改大小。

Answer 2

Surprisingly, combining the two methods also solved it: 令人惊讶的是，将两种方法结合起来也可以解决该问题：

fig, ax = plt.subplots()
fig.set_size_inches(20,20)
x = df['x'].tolist()
y = df['y'].tolist()
ax.scatter(x, y)
for i, txt in enumerate(data['c'].tolist()):
    ax.annotate(txt, (x[i],y[i]))
for c in np.unique(data['c'].tolist()):
    df = tsne_df[data['c'].isin([c])]
    plt.plot(data['x'].tolist(),data['y'].tolist(),'o')
plt.show()

Matplotlib：如何绘制具有不同颜色和注释的聚类？

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-02-20 14:17:49

解决方案2
0 2018-02-20 15:06:46

Matplotlib：如何绘制具有不同颜色和注释的聚类？

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-02-20 14:17:49

解决方案2 0 2018-02-20 15:06:46

解决方案1
2 已采纳 2018-02-20 14:17:49

解决方案2
0 2018-02-20 15:06:46