简体   繁体   中英

Can I set default values with matplotlib and pandas for each x tick?

I have the following code:

# Ratings by day, divided by Staff member

from datetime import datetime as dt

by_staff = df.groupby('User ID')

plt.figure(figsize=(15,8))

# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0

for index, data in by_staff:

    by_day = data.groupby('Date')
    
    x = pd.to_datetime(by_day.count().index)
    y = by_day.count()['Value']
    
    xmin = min(xmin, x.min())
    xmax = max(xmax, x.max())
    
    ymin = min(ymin, min(y))
    ymax = max(ymax, max(y))

    plt.plot_date(x, y, marker='o', label=index, markersize=12)

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

ticks = pd.date_range(xmin, xmax, freq='D')

plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))

plt.gcf().autofmt_xdate()

plt.grid()

plt.legend([a for a, b in by_staff],
          title="Ratings given",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.show()

I'd like to set the value shown at a specific xtick to 0 if there's no data for the day. Currently, this is the plot shown:

显示的情节

I tried some Google searches, but I can't seem to explain my problem correctly. How could I solve this?

My dataset: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv

Let's try to simplify the task by letting pandas aggregate the data. We group by Date and User ID simultaneously and then unstack the dataframe. This allows us to fill the missing data points with a preset value like 0.The form x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0) is compact chaining for a= df.groupby(["Date",'User ID']) , b=a.count() , c=b.Value , x=c.unstack(fill_value=0) . You can print out each intermediate result of these chained pandas operations to see what it does.

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])

#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)

ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)

plt.gcf().autofmt_xdate()
plt.grid()

plt.show()

Sample output: 在此处输入图像描述

Obviously, you don't see multiple entries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM