简体   繁体   English

使用分类值时,如何在分散 plot 中保留轴的顺序?

[英]How can I preserve order of axis in scatter plot when using categorical values?

I want to create a scatter plot that summarises my data in ntiles.我想创建一个散点图 plot 以 ntiles 形式汇总我的数据。 As scatter plot can't take Interval type as an axis parameter I convert the values to strings but then this loses the order of the Intervals, see the x-axis below is not ordered from low to high.由于 scatter plot 不能将间隔类型作为轴参数,因此我将值转换为字符串,但这会丢失间隔的顺序,请参阅下面的 x 轴不是从低到高排序的。 How can I preserve the order?我怎样才能保留订单?

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np


n_tile = 5
np.random.seed(0)
x = np.random.normal(150, 70, 3000,)
y = np.random.normal(1, 0.3, 3000)
r = np.random.normal(0.4, 0.1, 3000)

plot_data = pd.DataFrame({
            'x': x,
            'y': y,
            'r': r
                })
plot_data['x_group'] = pd.qcut(plot_data['x'], n_tile, duplicates='drop')
plot_data['y_group'] = pd.qcut(plot_data['y'], n_tile, duplicates='drop')
plot_data_grouped = plot_data.groupby(['x_group','y_group'], as_index=False).agg({'r':['mean','count']})
plot_data_grouped.columns = ['x','y','mean','count']

cmap = plt.cm.rainbow
norm = matplotlib.colors.Normalize(vmin=0, vmax=1)

plt.figure(figsize=(10,10))
plt.scatter(x=[str(x) for x in plot_data_grouped['x']], 
            y=[str(x) for x in plot_data_grouped['y']], 
            s=plot_data_grouped["count"], 
            c=plot_data_grouped['mean'], cmap="RdYlGn", edgecolors="black")
plt.show()

在此处输入图像描述

Sometimes, it is better to upgrade your current development packages.有时,最好升级您当前的开发包。 As your virtual-env has a local matplotlib installed.由于您的virtual-env环境安装了本地matplotlib After sourcing activates, upgrade matplotlib .采购激活后,升级matplotlib


For this, open terminal or command prompt with administrative privileges and try to upgrade pip and matplotlib versions using the following commands:为此,使用管理权限打开terminalcommand prompt并尝试使用以下命令升级pipmatplotlib版本:

  • python -m pip install --upgrade pip
  • python -m pip install --upgrade matplotlib

On the other hand, using matplotlib , you can get or set the current tick locations and labels of either of axes ( ie x-axis or y-axis ).另一方面,使用matplotlib ,您可以获取设置任一轴(即x-axisy-axis )的当前刻度位置和标签。


I am giving you a very simple example of your given data to plot in order along both axes.我给你一个非常简单的例子,说明你给定的数据 plot 沿两个轴的顺序排列 To preserve the orders along axes , you can simply use:要保留沿axes的顺序,您可以简单地使用:

You can use this technique to solve your problem with and without upgrading matplotlib .您可以使用此技术来解决您的问题,无论是否升级matplotlib Especially for your specified matplotlib==2.1.1 version.特别是对于您指定的matplotlib==2.1.1版本。


import matplotlib.pyplot as plt

x_axis_values = ['(-68.18100000000001, 89.754]', '(89.754, 130.42]', '(130.42, 165.601]', '(165.601, 205.456]',
                 '(205.456, 371.968]']

y_axis_values = ['(-0.123, 0.749]', '(0.749, 0.922]', '(0.922, 1.068]', '(1.068, 1.253]', '(1.253, 2.14]']

# Try to sort the values, before passing to [xticks, yticks]
# or in which order, you want them along axes
plt.xticks(ticks=range(len(x_axis_values)), labels=x_axis_values)
plt.yticks(ticks=range(len(y_axis_values)), labels=y_axis_values)

# plt.scatter(x_axis_values, y_axis_values)
plt.xlabel('Values')
plt.ylabel('Indices')

plt.show()

Here is the output of this simple example.这是这个简单示例的 output。 You can see the values along both the x-axis and the y-axis .您可以看到沿x-axisy-axis的值。 The purpose of the given figure is only to specify the values along with both axes :给定图形的目的仅是指定两个axesvalues

在此处输入图像描述


For your given code, I have updated some of your code as follows:对于您给定的代码,我已将您的一些代码更新如下:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np

n_tile = 5
np.random.seed(0)
x = np.random.normal(150, 70, 3000, )
y = np.random.normal(1, 0.3, 3000)
r = np.random.normal(0.4, 0.1, 3000)

plot_data = pd.DataFrame({
    'x': x,
    'y': y,
    'r': r
})
plot_data['x_group'] = pd.qcut(plot_data['x'], n_tile, duplicates='drop')
plot_data['y_group'] = pd.qcut(plot_data['y'], n_tile, duplicates='drop')
plot_data_grouped = plot_data.groupby(['x_group', 'y_group'], as_index=False).agg({'r': ['mean', 'count']})
plot_data_grouped.columns = ['x', 'y', 'mean', 'count']

cmap = plt.cm.rainbow
norm = matplotlib.colors.Normalize(vmin=0, vmax=1)

########################################################
##########  Updated Portion of the Code ################

x_axis_values = [str(x) for x in plot_data_grouped['x']]
y_axis_values = [str(x) for x in plot_data_grouped['y']]

plt.figure(figsize=(10, 10))
# Unique Values have only length == 5
plt.xticks(ticks=range(5), labels=sorted(np.unique(x_axis_values)))
plt.yticks(ticks=range(5), labels=sorted(np.unique(y_axis_values)))

plt.scatter(x=x_axis_values,
            y=y_axis_values,
            s=plot_data_grouped["count"],
            c=plot_data_grouped['mean'], cmap="RdYlGn", edgecolors="black")

plt.show()
########################################################

Now you can see the output is as required:现在您可以看到 output 是所需的:

在此处输入图像描述

There are two solutions here.这里有两种解决方案。 The simpler (and better solution) is to simply upgrade matplotlib to a newer version.更简单(更好的解决方案)是简单地将 matplotlib 升级到更新版本。

If that isn't an option the preferred alternative would be to handle the scatter plotting and tick labeling separately, this can be done pretty trivially.如果这不是一个选项,首选的替代方法是分别处理散点图和刻度标记,这可以非常简单地完成。 For example:例如:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np

n_tile = 5
np.random.seed(0)
x = np.random.normal(150, 70, 3000,)
y = np.random.normal(1, 0.3, 3000)
r = np.random.normal(0.4, 0.1, 3000)

plot_data = pd.DataFrame({'x': x, 'y': y, 'r': r})
plot_data['x_group'] = pd.qcut(plot_data['x'], n_tile, duplicates='drop')
plot_data['y_group'] = pd.qcut(plot_data['y'], n_tile, duplicates='drop')
plot_data_grouped = plot_data.groupby(['x_group','y_group'], as_index=False).agg({'r':['mean','count']})
plot_data_grouped.columns = ['x','y','mean','count']

cmap = plt.cm.rainbow
norm = matplotlib.colors.Normalize(vmin=0, vmax=1)

plt.figure(figsize=(10,10))
x = range(len(plot_data_grouped['x']))
y = range(len(plot_data_grouped['y']))
X, Y = np.meshgrid(x, y)
plt.scatter(x=X.flatten(), 
            y=Y.flatten(), 
            s=plot_data_grouped["count"], 
            c=plot_data_grouped['mean'], cmap="RdYlGn", edgecolors="black")
plt.xticks(x, [str(e) for e in plot_data_grouped['x']])
plt.yticks(y, [str(e) for e in plot_data_grouped['y']])
plt.show()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 散点图上的名义/分类轴 - Nominal/categorical axis on a scatter plot 如何使用分类 x 轴绘制 matplotlib 散点图,允许我根据第三个变量指定标记和颜色? - How can I do a matplotlib scatter plot with a categorical x-axis, that allows me to specify the marker and color based on a third variable? 如何通过使用pyplot降低y轴的值来对散点图plot进行排序? - How to sort scatter plot by decreasing values of y axis using pyplot? 我如何使用seaborn或matplotlib在python中绘制分类特征与分类值 - How can I plot a categorical feature vs categorical values in python using seaborn or matplotlib 使用 seaborn.FacetGrid 时如何更改 X 轴上的分类值的顺序? - how to change the order of categorical values on X-axis when using seaborn.FacetGrid? 如何在散点 plot 中对轴上的字符串值进行排序 - How to sort values with strings on the axis in scatter plot 使用for循环生成器制作具有分类值的Bokeh散点图 - Using for loop generator to make Bokeh scatter plot with categorical values Python:如何在绘制散点图时区分类别值? - Python: How to differentiate categorical values while plotting a scatter plot? 如何更改分类 x 轴的 plot 顺序 - How to change the plot order of the categorical x-axis Python seaborn line plot when i plot the x axis values are out of order (even though they are in order in dataframe) - Python seaborn line plot when i plot the x axis values are out of order (even though they are in order in dataframe)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM