简体   繁体   中英

Pandas scatter plot by category and point size

So I had the idea to using a single Pandas plot to show two different datum, one in Y axis and the other as the point size, but I wanted to categorize them, ie, the X axis is not a numerical value but some categories. I'll start by illustrating my two example dataframes:

earnings:
       DayOfWeek  Hotel  Bar  Pool
    0     Sunday     41   32    15
    1     Monday     45   38    24
    2    Tuesday     42   32    27
    3  Wednesday     45   37    23
    4   Thursday     47   34    26
    5     Friday     43   30    19
    6   Saturday     48   30    28

and

tips:
   DayOfWeek  Hotel  Bar  Pool
0     Sunday      7    8     6
1     Monday      9    7     5
2    Tuesday      5    4     1
3  Wednesday      8    6     7
4   Thursday      4    5    10
5     Friday      3    1     1
6   Saturday     10    2     6

Earnings is the total earnings in the hotel, the bar and the pool, and tips is the average tip value in the same locations. I'll post my code as an answer, please fell free to improve/update.

Cheers!

See also: Customizing Plot Legends

This is the kind of plot that is suited for a grammar of graphics.

import pandas as pd
from plotnine import *

# Create data
s1 = StringIO("""
       DayOfWeek  Hotel  Bar  Pool
    0     Sunday     41   32    15
    1     Monday     45   38    24
    2    Tuesday     42   32    27
    3  Wednesday     45   37    23
    4   Thursday     47   34    26
    5     Friday     43   30    19
    6   Saturday     48   30    28

""")

s2 = StringIO("""
   DayOfWeek  Hotel  Bar  Pool
0     Sunday      7    8     6
1     Monday      9    7     5
2    Tuesday      5    4     1
3  Wednesday      8    6     7
4   Thursday      4    5    10
5     Friday      3    1     1
6   Saturday     10    2     6
""")

# Read data
earnings = pd.read_csv(s1, sep="\s+")
tips = pd.read_csv(s2, sep="\s+")

# Make tidy data
kwargs = dict(value_vars=['Hotel', 'Bar', 'Pool'], id_vars=['DayOfWeek'], var_name='location')
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
earnings = pd.melt(earnings, value_name='earnings', **kwargs)
tips = pd.melt(tips,  value_name='tip', **kwargs)
df = pd.merge(earnings, tips, on=['DayOfWeek', 'location'])
df['DayOfWeek'] = pd.Categorical(df['DayOfWeek'], categories=days, ordered=True)

# Create plot
p = (ggplot(df)
     + geom_point(aes('DayOfWeek', 'earnings', color='location', size='tip'))
    )
print(p)

结果图

Here's the code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

earnings = pd.read_csv('earnings.csv', sep=';')
tips = pd.read_csv('tips.csv', sep=';')

print(earnings)

print(tips)

earnings['index'] = earnings.index

height, width = earnings.shape

cols = list(earnings.columns.values)

colors = ['r',  'g', 'b']

# Thanks for 
# https://stackoverflow.com/questions/43812911/adding-second-legend-to-scatter-plot
plt.rcParams["figure.subplot.right"] = 0.8

plt.figure(figsize=(8,4))

# get axis
ax = plt.gca()

# plot each column, each row will be in a different X coordinate, creating a category
for x in range(1, width-1):
    earnings.plot.scatter(x='index', y=cols[x], label=None, 
                          xticks=earnings.index, c=colors[x-1],
                          s=tips[cols[x]].multiply(10), linewidth=0, ax=ax)

# This second 'dummy' plot is to create the legend. If we use the one above, 
# [enter image description here][1]the circles in the legend might have different sizes
for x in range(1,width-1):
    earnings.loc[:1].plot.scatter([], [], label=cols[x], c=colors[x-1], s=30,
                                  linewidth=0, ax=ax)

# Label the X ticks with the categories' names
ax.set_xticklabels(earnings.loc[:,'DayOfWeek'])

ax.set_ylabel("Total Earnings")
ax.set_xlabel("Day of Week")


leg = plt.legend(title="Location", loc=(1.03,0))
ax.add_artist(leg)

# Create a second legent for the points' scale.
h = [plt.plot([],[], color="gray", marker="o", ms=i, ls="")[0] for i in range(1,10, 2)]
plt.legend(handles=h, labels=range(1,10, 2), loc=(1.03,0.5), title="Avg. Tip")

plt.show()

# See also:
# https://jakevdp.github.io/PythonDataScienceHandbook/04.06-customizing-legends.html

Resulting figure

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM