How to separate visualizations in subplots by year or by showing multiple lines on the same plot?

Question

Before asking this question, I spent a day yesterday looking for an answer in previous Stack Overflow answers as well as the Internet, but I couldn't find the solution to my problem.

I have a data frame for oil production in the US over time. The data includes the date column and corresponding values. The minimum reproducible code for the data is below:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/Arsik36/SO/master/Oil%20production.csv',
                parse_dates = ['date'], 
                 index_col = 'date')

I use the below code to visualize a general trend in oil production over time:

# Visualizing Time Series
df.value.plot(title = 'Oil production over time')

# Specifying naming convention for x-axis
plt.xlabel('Date')

# Specifying naming convention for y-axis
plt.ylabel('Oil production volume')

# Improving visual aesthetics
plt.tight_layout()

# Showing the result
plt.show()

By running this code in your environment, you see that the plot shows distribution of values over time. What I struggle with is either separate plot into subplots by years (for example, 1995 - 1997), or show different lines for each year on one graph

df['1995' : '1997'].value.plot(title = 'Oil production over time', subplots = True)

When I use this code, it correctly subsets my data for only years 1997, and with subplots = True The graph is indeed separated by year. However, by running this in your environment, you can see that graph is separated by year on the x-axis, but utilizes 1 line to show results for all 3 years. What I am trying to do is to either separate a plot into 3 subplots for years 1995, 1996, and 1997, or to show 3 lines in one plot, each line corresponding to a unique year.

It is important to me to be able to do this by keeping the date column as the index column without creating any additional columns (if possible) to solve this problem.

Thank you in advance for your help.

Answer 1

You're right suggesting that there's no implemented solution for python, I know that in R has an implementation for this in fpp2 .

The solution I've come up with is to get the data from each year from your data and plot it consecutively in a for loop.

years=[1995,1996,1997]

fig,ax=plt.subplots(figsize=(10,30))

for i in years:
    aux=df[df.index.map(lambda x : x.year == i)] #slice the data for each year
    aux.reset_index(inplace=True, drop=True) #we need to drop the index in order to be able to plot all lines in the same timeframe.

    #afterwards an index is given to all the series
    aux.set_index(pd.date_range(pd.to_datetime('01-01-2000'),periods=aux.shape[0], freq='W'),inplace=True)
    ax.set_xticklabels(aux.index, rotation = 90)
    ax.plot(aux.values, label=str(i))
    plt.legend()

fig.autofmt_xdate() #to be able to see the dates clearly

fig.show()

This yields a result like this:

The only thing left to do would be to format the x axis labels so only the months are displayed.

How to separate visualizations in subplots by year or by showing multiple lines on the same plot?

Question

1 answers

solution1
1 2020-08-15 17:48:58

How to separate visualizations in subplots by year or by showing multiple lines on the same plot?

Question

1 answers

solution1 1 2020-08-15 17:48:58

solution1
1 2020-08-15 17:48:58