简体   繁体   English

我可以为每个 x 刻度设置 matplotlib 和 pandas 的默认值吗?

[英]Can I set default values with matplotlib and pandas for each x tick?

I have the following code:我有以下代码:

# Ratings by day, divided by Staff member

from datetime import datetime as dt

by_staff = df.groupby('User ID')

plt.figure(figsize=(15,8))

# Those are used to calculate xticks and yticks
xmin, xmax = pd.to_datetime(dt.now()), pd.to_datetime(0)
ymin, ymax = 0, 0

for index, data in by_staff:

    by_day = data.groupby('Date')
    
    x = pd.to_datetime(by_day.count().index)
    y = by_day.count()['Value']
    
    xmin = min(xmin, x.min())
    xmax = max(xmax, x.max())
    
    ymin = min(ymin, min(y))
    ymax = max(ymax, max(y))

    plt.plot_date(x, y, marker='o', label=index, markersize=12)

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

ticks = pd.date_range(xmin, xmax, freq='D')

plt.xticks(ticks, rotation=60)
plt.yticks(range(ymin, ymax + 1))

plt.gcf().autofmt_xdate()

plt.grid()

plt.legend([a for a, b in by_staff],
          title="Ratings given",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))

plt.show()

I'd like to set the value shown at a specific xtick to 0 if there's no data for the day.如果当天没有数据,我想将特定 xtick 处显示的值设置为 0。 Currently, this is the plot shown:目前,这是 plot 所示:

显示的情节

I tried some Google searches, but I can't seem to explain my problem correctly.我尝试了一些谷歌搜索,但我似乎无法正确解释我的问题。 How could I solve this?我怎么能解决这个问题?

My dataset: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv我的数据集: https://cdn.discordapp.com/attachments/311932890017693700/800789506328100934/sample-ratings.csv

Let's try to simplify the task by letting pandas aggregate the data.让我们尝试通过让 pandas 聚合数据来简化任务。 We group by Date and User ID simultaneously and then unstack the dataframe.我们同时按日期和用户 ID 分组,然后取消堆叠dataframe。 This allows us to fill the missing data points with a preset value like 0.The form x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0) is compact chaining for a= df.groupby(["Date",'User ID']) , b=a.count() , c=b.Value , x=c.unstack(fill_value=0) .这允许我们用一个像 0 这样的预设值来填充缺失的数据点。形式x = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)是紧凑的链接a= df.groupby(["Date",'User ID'])b=a.count()c=b.Valuex=c.unstack(fill_value=0) You can print out each intermediate result of these chained pandas operations to see what it does.您可以打印出这些链式 pandas 操作的每个中间结果,看看它做了什么。

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv("test.csv", sep=",", parse_dates=["Date"])

#by_staff = df.groupby(["Date",'User ID']) - group entries by date and ID
#.count - count identical date-ID pairs
#.Value - use only this column
#.unstack(fill_value=0) bring resulting data from long to wide form
#and fill missing data with zero
by_staff = df.groupby(["Date",'User ID']).count().Value.unstack(fill_value=0)

ax = by_staff.plot(marker='o', markersize=12, linestyle="None", figsize=(15,8))

plt.title('Ratings by day, by Staff member', fontdict = {'fontsize': 25})
plt.xlabel('Day', fontsize=15)
plt.ylabel('n° of ratings for that day', fontsize=15)

#labeling only the actual rating values shown in the grid
plt.yticks(range(df.Value.max() + 1))
#this is not really necessary, it just labels zero differently
#labels = ["No rating"] + [str(i) for i in range(1, df.Value.max() + 1)]
#ax.set_yticklabels(labels)

plt.gcf().autofmt_xdate()
plt.grid()

plt.show()

Sample output:样品 output: 在此处输入图像描述

Obviously, you don't see multiple entries.显然,您不会看到多个条目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM