[英]Pandas scatter plot by category and point size
所以我有想法使用单个Pandas图来显示两个不同的数据,一个在Y轴,另一个作为点大小,但我想对它们进行分类,即X轴不是数值而是一些类别。 我将首先介绍我的两个示例数据帧:
earnings:
DayOfWeek Hotel Bar Pool
0 Sunday 41 32 15
1 Monday 45 38 24
2 Tuesday 42 32 27
3 Wednesday 45 37 23
4 Thursday 47 34 26
5 Friday 43 30 19
6 Saturday 48 30 28
和
tips:
DayOfWeek Hotel Bar Pool
0 Sunday 7 8 6
1 Monday 9 7 5
2 Tuesday 5 4 1
3 Wednesday 8 6 7
4 Thursday 4 5 10
5 Friday 3 1 1
6 Saturday 10 2 6
收入是酒店,酒吧和游泳池的总收入,而提示是相同位置的平均小费值。 我会发布我的代码作为答案,请随意改进/更新。
干杯!
另请参阅: 自定义绘图图例
这是一种适合图形语法的情节。
import pandas as pd
from plotnine import *
# Create data
s1 = StringIO("""
DayOfWeek Hotel Bar Pool
0 Sunday 41 32 15
1 Monday 45 38 24
2 Tuesday 42 32 27
3 Wednesday 45 37 23
4 Thursday 47 34 26
5 Friday 43 30 19
6 Saturday 48 30 28
""")
s2 = StringIO("""
DayOfWeek Hotel Bar Pool
0 Sunday 7 8 6
1 Monday 9 7 5
2 Tuesday 5 4 1
3 Wednesday 8 6 7
4 Thursday 4 5 10
5 Friday 3 1 1
6 Saturday 10 2 6
""")
# Read data
earnings = pd.read_csv(s1, sep="\s+")
tips = pd.read_csv(s2, sep="\s+")
# Make tidy data
kwargs = dict(value_vars=['Hotel', 'Bar', 'Pool'], id_vars=['DayOfWeek'], var_name='location')
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
earnings = pd.melt(earnings, value_name='earnings', **kwargs)
tips = pd.melt(tips, value_name='tip', **kwargs)
df = pd.merge(earnings, tips, on=['DayOfWeek', 'location'])
df['DayOfWeek'] = pd.Categorical(df['DayOfWeek'], categories=days, ordered=True)
# Create plot
p = (ggplot(df)
+ geom_point(aes('DayOfWeek', 'earnings', color='location', size='tip'))
)
print(p)
这是代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
earnings = pd.read_csv('earnings.csv', sep=';')
tips = pd.read_csv('tips.csv', sep=';')
print(earnings)
print(tips)
earnings['index'] = earnings.index
height, width = earnings.shape
cols = list(earnings.columns.values)
colors = ['r', 'g', 'b']
# Thanks for
# https://stackoverflow.com/questions/43812911/adding-second-legend-to-scatter-plot
plt.rcParams["figure.subplot.right"] = 0.8
plt.figure(figsize=(8,4))
# get axis
ax = plt.gca()
# plot each column, each row will be in a different X coordinate, creating a category
for x in range(1, width-1):
earnings.plot.scatter(x='index', y=cols[x], label=None,
xticks=earnings.index, c=colors[x-1],
s=tips[cols[x]].multiply(10), linewidth=0, ax=ax)
# This second 'dummy' plot is to create the legend. If we use the one above,
# [enter image description here][1]the circles in the legend might have different sizes
for x in range(1,width-1):
earnings.loc[:1].plot.scatter([], [], label=cols[x], c=colors[x-1], s=30,
linewidth=0, ax=ax)
# Label the X ticks with the categories' names
ax.set_xticklabels(earnings.loc[:,'DayOfWeek'])
ax.set_ylabel("Total Earnings")
ax.set_xlabel("Day of Week")
leg = plt.legend(title="Location", loc=(1.03,0))
ax.add_artist(leg)
# Create a second legent for the points' scale.
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(1,10, 2)]
plt.legend(handles=h, labels=range(1,10, 2), loc=(1.03,0.5), title="Avg. Tip")
plt.show()
# See also:
# https://jakevdp.github.io/PythonDataScienceHandbook/04.06-customizing-legends.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.