简体   繁体   English

如何在 Python Matplotlib 中的同一 x 轴上 plot 具有不同开始日期的多个时间序列数据?

[英]How to plot multiple timeseries data with different start date on the same x-axis in Python Matplotlib?

I am trying to plot three timeseries datasets with different start date on the same x-axis, similar to this question How to plot timeseries with different start date on the same x axis .我正在尝试 plot 在同一 x 轴上具有不同开始日期的三个时间序列数据集,类似于这个问题How to plot timeseries with different start date on the same x axis Except that my x-axis has dates instead of days.除了我的 x 轴有日期而不是天数。

My data frame is structured as:我的数据框结构如下:

Date ColA Label日期 ColA Label
01/01/2019 1.0 Training 01/01/2019 1.0 培训
02/01/2019 1.0 Training 02/01/2019 1.0 培训
... ...
14/09/2020 2.0 Test1 14/09/2020 2.0 测试1
.. ..
06/01/2021 4.0 Test2 06/01/2021 4.0 测试2
... ...

I have defined each time series as:我将每个时间序列定义为:

train = df.loc['01/01/2019':'05/08/2020', 'ColA']  
test1 = df.loc['14/09/2020':'20/12/2020', 'ColA']  
test2 = df.loc['06/01/2021':'18/03/2021', 'ColA']  

This is how individual time series plot:这就是单个时间序列 plot 的方式: 数据1 数据2 数据3

But when I try to plot them on the same x-axis, it doesn't plot in sequence of dates但是当我尝试 plot 它们在同一个 x 轴上时,它不会按日期顺序 plot 数据全部 I am hoping to produce something like this (from MS Excel):我希望产生这样的东西(来自 MS Excel): 在此处输入图像描述

Any help would be great!任何帮助都会很棒!

Thank you谢谢

Make sure that 'Date' column in your dataframe is imported as datetime variable and not as string.确保 dataframe 中的“日期”列作为日期时间变量而不是字符串导入。

If you find dtype as "object":如果您发现 dtype 为“对象”:

df = pd.read_csv('data.csv')
data['Date']
0      2019-01-01
1      2019-01-02
2      2019-01-03
       

    Name: Date, Length: 830, dtype: object

You need to convert to datetime variable.您需要转换为 datetime 变量。 You can convert in two ways:您可以通过两种方式进行转换:

  1.  df = pd.read_csv('data.csv', parse_dates=['Date'])

OR或者

  1. df = pd.read_csv('data.csv') df['Date'] = pd.to_datetime(data['Date'])

Both options will give you the same result.这两个选项都会给你相同的结果。

df = pd.read_csv('data.csv', parse_dates=['Date'])
data['Date']
0      2019-01-01
1      2019-01-02
2      2019-01-03
       ...

    Name: Date, Length: 830, dtype: datetime64[ns]

Then, you can just plot:然后,您可以只使用 plot:

plt.plot(data['Date'],ColA)

When you define individual time series, make sure to check the formatting of dates.当您定义单个时间序列时,请务必检查日期格式。 Datetime format in pandas is YYYY-MM-DD. pandas 中的日期时间格式为 YYYY-MM-DD。 So, use this instead:所以,改用这个:

train = df.loc['2019-01-01':'2020-08-05', 'ColA'] and so on...

I am assuming that your data is stored as csv (or excel).我假设您的数据存储为 csv (或 excel)。 If so, be careful of how MS Excel may change the formatting of the Date column anytime you open the data file in Excel.如果是这样,当您在 Excel 中打开数据文件时,请注意 MS Excel 可能如何更改日期列的格式。 Best practice would be to always check the formatting of 'Date' column using最佳做法是始终使用检查“日期”列的格式

type(data['Date']) after importing dataframe.

I assume you have a dataframe consists at least of date , record , and label of training, test #1 and test#2我假设你有一个 dataframe 至少包含daterecordlabel的训练、测试 #1 和 test#2
would sharex = True do the trick? sharex = True能解决问题吗?

fig, ax = plt.subplots(3,1, sharex = True)

for i,j in zip(data['label'].unique(), range(3)):
    ax[j].plot(x = df[df['label'] == i]['date'], 
               y = df[df['label'] == i]['record'])

EDIT编辑

This should do it这应该这样做

fig, ax = plt.subplots(figsize = (14,6))
color = ['blue','red','orange']

for i,j in zip(df.Label.unique().tolist(), color):
    ax.plot(x = df['Date'][df.Label == i], y = df['ColA'][df.Label == i], 
            color = j, label = j)
plt.legend(loc = 'best')
plt.show()

You basically want to plot multiple times in the same figure of matplotlib.您基本上想在 matplotlib 的同一图中多次 plot 。 Just use the initial dataset (which includes all the labels), no need to use the separated one.只需使用初始数据集(包括所有标签),无需使用分离的数据集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM