[英]Scatter plots in Pandas/Pyplot: How to plot by category
I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key).我正在尝试使用 Pandas DataFrame 对象在 pyplot 中制作一个简单的散点图,但想要一种绘制两个变量的有效方法,但符号由第三列(键)指示。 I have tried various ways using df.groupby, but not successfully.我尝试了各种使用 df.groupby 的方法,但都没有成功。 A sample df script is below.下面是一个示例 df 脚本。 This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories.这会根据“key1”为标记着色,但我希望看到带有“key1”类别的图例。 Am I close?我很亲近吗? Thanks.谢谢。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
plt.show()
You can use scatter
for this, but that requires having numerical values for your key1
, and you won't have a legend, as you noticed.您可以为此使用scatter
,但这需要您的key1
具有数值,并且您不会有图例,正如您所注意到的。
It's better to just use plot
for discrete categories like this.最好将plot
用于这样的离散类别。 For example:例如:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)
# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))
groups = df.groupby('label')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()
plt.show()
If you'd like things to look like the default pandas
style, then just update the rcParams
with the pandas stylesheet and use its color generator.如果您希望看起来像默认的pandas
样式,那么只需使用熊猫样式表更新rcParams
并使用其颜色生成器。 (I'm also tweaking the legend slightly): (我也在稍微调整图例):
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)
# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))
groups = df.groupby('label')
# Plot
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')
fig, ax = plt.subplots()
ax.set_color_cycle(colors)
ax.margins(0.05)
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend(numpoints=1, loc='upper left')
plt.show()
This is simple to do with Seaborn ( pip install seaborn
) as a oneliner这很容易用Seaborn ( pip install seaborn
) 作为 oneliner
sns.scatterplot(x_vars="one", y_vars="two", data=df, hue="key1")
: sns.scatterplot(x_vars="one", y_vars="two", data=df, hue="key1")
:
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(1974)
df = pd.DataFrame(
np.random.normal(10, 1, 30).reshape(10, 3),
index=pd.date_range('2010-01-01', freq='M', periods=10),
columns=('one', 'two', 'three'))
df['key1'] = (4, 4, 4, 6, 6, 6, 8, 8, 8, 8)
sns.scatterplot(x="one", y="two", data=df, hue="key1")
Here is the dataframe for reference:这是供参考的数据框:
Since you have three variable columns in your data, you may want to plot all pairwise dimensions with:由于您的数据中有三个变量列,您可能希望绘制所有成对维度:
sns.pairplot(vars=["one","two","three"], data=df, hue="key1")
https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/ is another option. https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/是另一种选择。
With plt.scatter
, I can only think of one: to use a proxy artist:使用plt.scatter
,我只能想到一个:使用代理艺术家:
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig1 = plt.figure(1)
ax1 = fig1.add_subplot(111)
x=ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
ccm=x.get_cmap()
circles=[Line2D(range(1), range(1), color='w', marker='o', markersize=10, markerfacecolor=item) for item in ccm((array([4,6,8])-4.0)/4)]
leg = plt.legend(circles, ['4','6','8'], loc = "center left", bbox_to_anchor = (1, 0.5), numpoints = 1)
And the result is:结果是:
You can use df.plot.scatter, and pass an array to c= argument defining the color of each point:您可以使用 df.plot.scatter,并将数组传递给 c= 参数定义每个点的颜色:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
colors = np.where(df["key1"]==4,'r','-')
colors[df["key1"]==6] = 'g'
colors[df["key1"]==8] = 'b'
print(colors)
df.plot.scatter(x="one",y="two",c=colors)
plt.show()
From matplotlib 3.1 onwards you can use .legend_elements()
.从 matplotlib 3.1 开始,您可以使用.legend_elements()
。 An example is shown in Automated legend creation . 自动图例创建中显示了一个示例。 The advantage is that a single scatter call can be used.优点是可以使用单个分散调用。
In this case:在这种情况下:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3),
index = pd.date_range('2010-01-01', freq = 'M', periods = 10),
columns = ('one', 'two', 'three'))
df['key1'] = (4,4,4,6,6,6,8,8,8,8)
fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
ax.legend(*sc.legend_elements())
plt.show()
In case the keys were not directly given as numbers, it would look as如果键不是直接作为数字给出的,它看起来像
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3),
index = pd.date_range('2010-01-01', freq = 'M', periods = 10),
columns = ('one', 'two', 'three'))
df['key1'] = list("AAABBBCCCC")
labels, index = np.unique(df["key1"], return_inverse=True)
fig, ax = plt.subplots()
sc = ax.scatter(df['one'], df['two'], marker = 'o', c = index, alpha = 0.8)
ax.legend(sc.legend_elements()[0], labels)
plt.show()
You can also try Altair or ggpot which are focused on declarative visualisations.您还可以尝试专注于声明式可视化的Altair或ggpot 。
import numpy as np
import pandas as pd
np.random.seed(1974)
# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))
from altair import Chart
c = Chart(df)
c.mark_circle().encode(x='x', y='y', color='label')
from ggplot import *
ggplot(aes(x='x', y='y', color='label'), data=df) +\
geom_point(size=50) +\
theme_bw()
It's rather hacky, but you could use one1
as a Float64Index
to do everything in one go:它相当one1
,但您可以使用one1
作为Float64Index
完成所有操作:
df.set_index('one').sort_index().groupby('key1')['two'].plot(style='--o', legend=True)
Note that as of 0.20.3, sorting the index is necessary , and the legend is a bit wonky .请注意,从 0.20.3 开始,排序索引是必要的,并且图例有点不稳定。
seaborn 有一个包装函数scatterplot
,可以更有效地完成它。
sns.scatterplot(data = df, x = 'one', y = 'two', data = 'key1'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.