I have the following code:
# Ratings by day, divided by Staff member
from datetime import datetime as dt
by_staff = df.groupby('User ID')
plt.figure(figsize=(15,8))
# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0
for index, data in by_staff:
by_day = data.groupby('Date')
x = pd.to_datetime(by_day.count().index)
y = by_day.count()['Value']
xmin = min(xmin, x.min())
xmax = max(xmax, x.max())
ymin = min(ymin, min(y))
ymax = max(ymax, max(y))
plt.plot_date(x, y, marker='o', label=index, markersize=12)
plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)
ticks = pd.date_range(xmin, xmax, freq='D')
plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))
plt.gcf().autofmt_xdate()
plt.grid()
plt.legend([a for a, b in by_staff],
title="Ratings given",
loc="center left",
bbox_to_anchor=(1, 0, 0.5, 1))
plt.show()
I'd like to set the value shown at a specific xtick to 0 if there's no data for the day. Currently, this is the plot shown:
I tried some Google searches, but I can't seem to explain my problem correctly. How could I solve this?
My dataset: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv
Let's try to simplify the task by letting pandas aggregate the data. We group by Date and User ID simultaneously and then unstack the dataframe. This allows us to fill the missing data points with a preset value like 0.The form x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)
is compact chaining for a= df.groupby(["Date",'User ID'])
, b=a.count()
, c=b.Value
, x=c.unstack(fill_value=0)
. You can print out each intermediate result of these chained pandas operations to see what it does.
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])
#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)
ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))
plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)
#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)
plt.gcf().autofmt_xdate()
plt.grid()
plt.show()
Obviously, you don't see multiple entries.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.