[英]How to plot data with gaps into subplots
I have a dataframe with gaps我有一个有间隙的 dataframe
temperature
data
2016-01-01 01:00:00 -8.2
2016-01-01 02:00:00 -8.3
2016-01-01 03:00:00 -9.1
2016-01-01 04:00:00 -9.1
2016-01-01 05:00:00 -9.6
... ...
2020-02-29 20:00:00 5.9
2020-02-29 21:00:00 5.4
2020-02-29 22:00:00 4.7
2020-02-29 23:00:00 4.3
2020-03-01 00:00:00 4.3
Here is the code for some sample data, different from mine but the concept is the same:这是一些示例数据的代码,与我的不同,但概念是相同的:
def tworzeniedaty():
import pandas as pd
rng1 = list(pd.date_range(start='2016-01-01', end='2016-02-29', freq='D'))
rng2 = list(pd.date_range(start='2016-12-15', end='2017-02-28', freq='D'))
rng3 = list(pd.date_range(start='2017-12-15', end='2018-02-28', freq='D'))
rng4 = list(pd.date_range(start='2018-12-15', end='2019-02-28', freq='D'))
rng5 = list(pd.date_range(start='2019-12-15', end='2020-02-29', freq='D'))
return rng1 + rng2 + rng3 + rng4 + rng5
import random
import pandas as pd
lista = [random.randrange(1, 10, 1) for i in range(len(tworzeniedaty()))]
df = pd.DataFrame({'Date': tworzeniedaty(), 'temperature': lista})
df['Date'] = pd.to_datetime(df['Date'], format="%Y/%m/%d")
When I plot the data I get a very messy plot.当我 plot 数据时,我得到一个非常混乱的 plot。
Instead I would like to get:相反,我想得到:
It is the same question as How to plot only specific months in a time series of several years?这与如何在几年的时间序列中仅特定月份的 plot 相同的问题? but I would like to do it in python and can't decipher R code.但我想在 python 中执行它并且无法破译 R 代码。
The best approach I think is to filter out Jun/Jul/Aug data, as done in the R code.我认为最好的方法是过滤掉六月/七月/八月的数据,就像在 R 代码中所做的那样。 This should help:这应该有助于:
def tworzeniedaty():
import pandas as pd
rng1 = list(pd.date_range(start='2016-01-01', end='2016-02-29', freq='D'))
rng2 = list(pd.date_range(start='2016-12-15', end='2017-02-28', freq='D'))
rng3 = list(pd.date_range(start='2017-12-15', end='2018-02-28', freq='D'))
rng4 = list(pd.date_range(start='2018-12-15', end='2019-02-28', freq='D'))
rng5 = list(pd.date_range(start='2019-12-15', end='2020-02-29', freq='D'))
return rng1 + rng2 + rng3 + rng4 + rng5
import random
import pandas as pd
import matplotlib.pyplot as plt
lista = [random.randrange(1, 10, 1) for i in range(len(tworzeniedaty()))]
df = pd.DataFrame({'Date': tworzeniedaty(), 'temperature': lista})
df['Date'] = pd.to_datetime(df['Date'], format="%Y/%m/%d")
years = list(set(df.Date.dt.year))
fig, ax = plt.subplots(1, len(years))
for i in years:
df_set = df[df.Date.dt.year == i]
df_set.set_index("Date", inplace = True)
df_set.index = df_set.index.map(str)
ax[years.index(i)].plot(df_set)
ax[years.index(i)].title.set_text(i)
plt.show()
We can group the data by calculating the difference between dates and checking if it exceeds a limit like three months:我们可以通过计算日期之间的差异并检查它是否超过三个月的限制来对数据进行分组:
from matplotlib import pyplot as plt
import random
import pandas as pd
def tworzeniedaty():
rng1 = list(pd.date_range(start='2016-01-01', end='2016-02-29', freq='D'))
rng2 = list(pd.date_range(start='2016-12-15', end='2017-02-28', freq='D'))
rng3 = list(pd.date_range(start='2017-12-15', end='2018-02-28', freq='D'))
rng4 = list(pd.date_range(start='2018-12-15', end='2019-02-28', freq='D'))
rng5 = list(pd.date_range(start='2019-12-15', end='2020-02-29', freq='D'))
return rng1 + rng2 + rng3 + rng4 + rng5
lista = [random.randrange(1, 10, 1) for i in range(len(tworzeniedaty()))]
df = pd.DataFrame({'Date': tworzeniedaty(), 'temperature': lista})
#assuming that the df is sorted by date, we look for gaps of more than 3 months
#then we label the groups with consecutive numbers
df["groups"] = (df["Date"].dt.month.diff() > 3).cumsum()
n = 1 + df["groups"].max()
#creating the desired number of subplots
fig, axes = plt.subplots(1, n, figsize=(15, 5), sharey=True)
#plotting each group into a subplot
for (i, group_df), ax in zip(df.groupby("groups"), axes.flat):
ax.plot(group_df["Date"], group_df["temperature"])
fig.autofmt_xdate(rotation=45)
plt.tight_layout()
plt.show()
Sample output:样品 output:
Obviously, some fine-tuning is necessary if more groups should exist.显然,如果应该存在更多组,则需要进行一些微调。 In this case, a grid would be appropriate - one can create a subplot grid and remove unnecessary subplots like in this matplotlib example .在这种情况下,网格将是合适的 - 可以创建一个子图网格并删除不必要的子图,例如 matplotlib 示例。 The x-labels probably also need some adjustment with a matplotlib Locator and Formatter for better appearance. x-labels 可能还需要使用matplotlib 定位器和格式化程序进行一些调整,以获得更好的外观。 Some of this can be automated using the grouping variable with hue
in seaborn ;其中一些可以使用seaborn 中带有hue
的分组变量来自动化; however, this may lead to a different set of problems.但是,这可能会导致一系列不同的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.