Can I set default values with matplotlib and pandas for each x tick?

Question

I have the following code:

# Ratings by day, divided by Staff member

from datetime import datetime as dt

by_staff = df.groupby('User ID')

plt.figure(figsize=(15,8))

# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0

for index, data in by_staff:

    by_day = data.groupby('Date')
    
    x = pd.to_datetime(by_day.count().index)
    y = by_day.count()['Value']
    
    xmin = min(xmin, x.min())
    xmax = max(xmax, x.max())
    
    ymin = min(ymin, min(y))
    ymax = max(ymax, max(y))

    plt.plot_date(x, y, marker='o', label=index, markersize=12)

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

ticks = pd.date_range(xmin, xmax, freq='D')

plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))

plt.gcf().autofmt_xdate()

plt.grid()

plt.legend([a for a, b in by_staff],
          title="Ratings given",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.show()

I'd like to set the value shown at a specific xtick to 0 if there's no data for the day. Currently, this is the plot shown:

I tried some Google searches, but I can't seem to explain my problem correctly. How could I solve this?

My dataset: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv

Answer 1

Let's try to simplify the task by letting pandas aggregate the data. We group by Date and User ID simultaneously and then unstack the dataframe. This allows us to fill the missing data points with a preset value like 0.The form x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0) is compact chaining for a= df.groupby(["Date",'User ID']) , b=a.count() , c=b.Value , x=c.unstack(fill_value=0) . You can print out each intermediate result of these chained pandas operations to see what it does.

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])

#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)

ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)

plt.gcf().autofmt_xdate()
plt.grid()

plt.show()

Sample output:

Obviously, you don't see multiple entries.

Can I set default values with matplotlib and pandas for each x tick?

Question

1 answers

solution1
1 ACCPTED 2021-01-19 14:15:14

Can I set default values with matplotlib and pandas for each x tick?

Question

1 answers

solution1 1 ACCPTED 2021-01-19 14:15:14

solution1
1 ACCPTED 2021-01-19 14:15:14