我可以为每个 x 刻度设置 matplotlib 和 pandas 的默认值吗？

Question

I have the following code:我有以下代码：

# Ratings by day, divided by Staff member

from datetime import datetime as dt

by_staff = df.groupby('User ID')

plt.figure(figsize=(15,8))

# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0

for index, data in by_staff:

    by_day = data.groupby('Date')
    
    x = pd.to_datetime(by_day.count().index)
    y = by_day.count()['Value']
    
    xmin = min(xmin, x.min())
    xmax = max(xmax, x.max())
    
    ymin = min(ymin, min(y))
    ymax = max(ymax, max(y))

    plt.plot_date(x, y, marker='o', label=index, markersize=12)

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

ticks = pd.date_range(xmin, xmax, freq='D')

plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))

plt.gcf().autofmt_xdate()

plt.grid()

plt.legend([a for a, b in by_staff],
          title="Ratings given",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.show()

I'd like to set the value shown at a specific xtick to 0 if there's no data for the day.如果当天没有数据，我想将特定 xtick 处显示的值设置为 0。 Currently, this is the plot shown:目前，这是 plot 所示：

I tried some Google searches, but I can't seem to explain my problem correctly.我尝试了一些谷歌搜索，但我似乎无法正确解释我的问题。 How could I solve this?我怎么能解决这个问题？

My dataset: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv我的数据集： https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv

Answer 1

Let's try to simplify the task by letting pandas aggregate the data.让我们尝试通过让 pandas 聚合数据来简化任务。 We group by Date and User ID simultaneously and then unstack the dataframe.我们同时按日期和用户 ID 分组，然后取消堆叠dataframe。 This allows us to fill the missing data points with a preset value like 0.The form x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0) is compact chaining for a= df.groupby(["Date",'User ID']) , b=a.count() , c=b.Value , x=c.unstack(fill_value=0) .这允许我们用一个像 0 这样的预设值来填充缺失的数据点。形式x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)是紧凑的链接a= df.groupby(["Date",'User ID']) ， b=a.count() ， c=b.Value ， x=c.unstack(fill_value=0) 。 You can print out each intermediate result of these chained pandas operations to see what it does.您可以打印出这些链式 pandas 操作的每个中间结果，看看它做了什么。

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])

#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)

ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)

plt.gcf().autofmt_xdate()
plt.grid()

plt.show()

Sample output:样品 output：

Obviously, you don't see multiple entries.显然，您不会看到多个条目。

我可以为每个 x 刻度设置 matplotlib 和 pandas 的默认值吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-19 14:15:14

我可以为每个 x 刻度设置 matplotlib 和 pandas 的默认值吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-19 14:15:14

解决方案1
1 已采纳 2021-01-19 14:15:14