简体   繁体   English

带有日期轴的Pandas / matplotlib图显示正确的日/月但错误的工作日/年

[英]Pandas/matplotlib plot with date-axis shows correct day/month but wrong weekday/year

I'm loading CSV data using pandas, where one of the columns takes the form of a date in the format '%a %d.%m.%Y' (eg 'Mon 06.02.2017'), and then trying to make some plots where the x-axis is labeled according to the date. 我正在使用pandas加载CSV数据,其中一列采用格式为'%a%d。%m。%Y'的日期形式(例如'Mon 06.02.2017'),然后尝试制作一些情节,其中x轴根据日期标记。

Something goes wrong during the plotting, because the date labels are wrong; 在绘图过程中出现问题,因为日期标签错误; eg what was 'Mon 06.02.2017' in the CSV/DataFrame is shown as 'Thu 06.02.0048' on the plot axis. 例如,CSV / DataFrame中的“Mon 06.02.2017”在绘图轴上显示为“Thu 06.02.0048”。

Here is a MWE. 这是一个MWE。 This is file 'data.csv': 这是'data.csv'文件:

Mon 06.02.2017  ;  1  ;  2  ;  3
Tue 07.02.2017  ;  4  ;  5  ;  6
Wed 08.02.2017  ;  7  ;  8  ;  9
Thu 09.02.2017  ; 10  ; 11  ; 12
Fri 10.02.2017  ; 13  ; 14  ; 15
Sat 11.02.2017  ; 16  ; 17  ; 18
Sun 12.02.2017  ; 19  ; 20  ; 21
Mon 13.02.2017  ; 22  ; 23  ; 24
Tue 14.02.2017  ; 25  ; 26  ; 27
Wed 15.02.2017  ; 28  ; 29  ; 30
Thu 16.02.2017  ; 31  ; 32  ; 33

And this is the parsing/plotting code 'plot.py': 这是解析/绘图代码'plot.py':

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates


df = pd.read_csv(
        'data.csv',
        sep='\s*;\s*',
        header=None,
        names=['date', 'x', 'y', 'z'],
        parse_dates=['date'],
        date_parser=lambda x: pd.datetime.strptime(x, '%a %d.%m.%Y'),
        # infer_datetime_format=True,
        # dayfirst=True,
        engine='python',
)

# DataFrame 'date' Series looks fine
print df.date

ax1 = df.plot(x='date', y='x', legend=True)
ax2 = df.plot(x='date', y='y', ax=ax1, legend=True)
ax3 = df.plot(x='date', y='z', ax=ax1, legend=True)

ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
ax1.xaxis.grid(True, which='minor')

plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)
plt.tight_layout()

plt.show()

Notice that the DataFrame.date Series seems to contain the correct dates, so it's likely a matplotlib issue rather than a pandas/parsing error. 请注意,DataFrame.date系列似乎包含正确的日期,因此它可能是matplotlib问题而不是pandas / parsing错误。

In case it might matter (although I doubt), my locale is LC_TIME = en_US.UTF-8. 万一它可能很重要(虽然我怀疑),我的语言环境是LC_TIME = en_US.UTF-8。

Also, according to https://www.timeanddate.com/date/weekday.html , the day 06.02.0048 was actually a Tuesday, so somehow the plotted year isn't even really year 0048. 另外,根据https://www.timeanddate.com/date/weekday.html,06.02.0048当天实际上是星期二,所以绘制的年份实际上甚至不是0048年。

I'm really at a loss, thanks to anyone who is willing to check this out. 我真的很茫然,感谢任何愿意检查出来的人。

Although I couldn't really figure out why it's not working, it seems it has something to do with plotting with pandas vs. solely with matplotlib and maybe the mdates.DateFormatter ... 虽然我无法弄清楚它为什么不起作用,但它似乎与使用pandas进行绘图有关,而仅仅使用matplotlib和mdates.DateFormatter ......

When I comment out the formatting lines, it seems to start working: 当我注释掉格式化行时,它似乎开始工作:

# ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
# ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
# ax1.xaxis.grid(True, which='minor')
# 
# plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
# plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)

在此输入图像描述

Pandas plotting the dates automatically works fine, but calling any matplotlib functions breaks the dates. Pandas绘制日期自动工作正常,但调用任何matplotlib函数会打破日期。 Only commenting out #plt.setp(ax1.xaxis.get_majorticklabels(), visible=False) , will plot both the Pandas and Matplotlib xaxis, with the odd 0048 showing up again: 只注释#plt.setp(ax1.xaxis.get_majorticklabels(), visible=False) ,将绘制Pandas和Matplotlib xaxis,奇数0048再次显示: 在此输入图像描述

So the issue remains. 所以问题仍然存在。

However, you can circumvent this by replacing parse_dates=['date'] with index_col=0 , creating a matplotlib figure explicitly, and changing mdates.DateFormatter with ticker.FixedFormatter : 但是,您可以通过用index_col=0替换parse_dates=['date'] ,显式创建matplotlib图,并使用ticker.FixedFormatter更改mdates.DateFormatterticker.FixedFormatter

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker

df = pd.read_csv(
    'data.csv',
    sep='\s*;\s*',
    header=None,
    names=['date', 'x', 'y', 'z'],
    index_col=0,
    date_parser=lambda x: pd.to_datetime(x, format='%a %d.%m.%Y'),
    engine='python'
)

ax = plt.figure().add_subplot(111)
ax.plot(df)

ticklabels = [item.strftime('%d-%m-%y') for item in df.index]
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))

plt.xticks(rotation='90')
ax.xaxis.grid(True, which='major')

plt.tight_layout()

plt.show()

在此输入图像描述

I ran into this problem as well, but the root cause was different. 我也遇到了这个问题,但根本原因是不同的。

I put some debugging in the matplotlib DateFormatter class to figure out what data it was actually operating on. 我在matplotlib DateFormatter类中进行了一些调试,以确定它实际运行的数据。 As it turned out, the pandas query that was running against postgres was producing date objects instead of timestamp objects. 事实证明,针对postgres运行的pandas查询正在生成日期对象而不是时间戳对象。 This was causing the dates to get mis-parsed such that the year was incorrect (parsed as year 0046 instead of 2018). 这导致日期被错误解析,使得年份不正确(解析为0046年而不是2018年)。

The solution was to update the query to cast the time column as a timestamp, and then everything worked out correctly. 解决方案是更新查询以将时间列强制转换为时间戳,然后一切正常。

SELECT start_time::timestamp at time zone '{{timezone}}' as "Start Time" ...

That said, I'm a bit shocked that the related libraries are not robust enough to handle the different kinds of date representations that postgres can produce. 也就是说,我有点震惊的是,相关的库不够强大,无法处理postgres可以生成的各种日期表示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM