简体   繁体   中英

How to make correct covid tracking time series plot with matplotlib in python?

I want to track new covid19 case number in each establishments in the company, which is daily time series. I'd like to see how new cases of covid19 can be tracked by realtime with nice EDA plot. I tried matplotlib to make histogram plot for each company in one page but couldn't make correct one. Can anyone point me out how to get this right? Any thoughts?

reproducible data :

Here is the reproducible covid19 tracking time series data in this gist . In this data, est is refers to establishment code , so each different company might have multiple establishment codes.

my attempt

Here is my attempt with seaborns and matplotlib:

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import timedelta, datetime

bigdf = pd.read_csv("coviddf.csv")

markers = {"new_case_sum": "s", "est_company": "X"}
for t in bigdf.company.unique():
    grouped = bigdf[bigdf.company==t]
    res = grouped.groupby(['run_date','county-state', 'company'])['new'].sum().unstack().reset_index('run_date')
    f, axes = plt.subplots(nrows=len(bigdf.company), ncols= 1, figsize=(20, 7), squeeze=False)
    for j in range(len(bigdf.company)):
        p = sns.scatterplot('run_date', 'new', data=res, hue='company', markers=markers, style='cats', ax=axes[j, 0])
        p.set_title(f'Threshold: {t}\n{pt}')
        p.set_xlim(data['run_date'].min() - timedelta(days=60), data['run_date'].max() + timedelta(days=60))
        plt.legend(bbox_to_anchor=(1.04, 0.5), loc="center left", borderaxespad=0)

but I couldn't get correct plot. I think I made correct data aggregation for plotting data but somehow I used wrong data attributes to render plot. Can anyone suggest me where's my mistake? Can anyone suggest better approach to make this happen? Any idea?

desired plot

ideally, I want to render plot something like this structure (attached desired plot is just reference from other site):

在此处输入图片说明

Can anyone suggest how to make my above approach right? any better suggestion to make better time series plot for covid tracking? Thanks

update :

in my attempt, I tried to aggregate new case number by all establishments in each company then make linechart or histogram. How can we make linechart where all confirmed, death, and new cases of all establishment (aka, est column) in each company along the date in one page plot? Any idea to make this happen?

  • The following code will use sns.FacetGrid andsns.barplot
  • Each row will be the company and each column will be a barplot for each est .
    • The x-axis will be run_date . I added extra data so there would be two dates.
    • The y-axis and hue , will be the val for new , confirmed , and dead .
  • .stack is used on groupby to stack new , confirmed , and dead into one column.
import pandas as pd
import seaborn as sns

# load and clean data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)  # drop this extra column
df.run_date = pd.to_datetime(df.run_date)  # set run_date to a datetime format

# plot
for g, d in df.groupby(['company']):
    data = d.groupby(['run_date','county-state', 'company', 'est'], as_index=True).agg({'new': sum, 'confirmed': sum, 'death': sum}).stack().reset_index().rename(columns={'level_4': 'type', 0: 'val'})
#     display(data)  # if you're not using Jupyter, change display to print
#     print('\n')
    print(f'{g}')
    g = sns.FacetGrid(data, col='est', sharex=False, sharey=False, height=5, col_wrap=4)
    g.map(sns.barplot, 'run_date', 'val', 'type', order=data.run_date.dt.date.unique(), hue_order=data['type'].unique())
    g.add_legend()
    g.set_xticklabels(rotation=90)
    g.set(yscale='log')
    plt.tight_layout()
    plt.show()

groupby example for Vergin

     run_date      county-state company  est       type    val
0  2020-08-30    ColfaxNebraska  Vergin  86M        new      2
1  2020-08-30    ColfaxNebraska  Vergin  86M  confirmed    718
2  2020-08-30    ColfaxNebraska  Vergin  86M      death      5
3  2020-08-30        FordKansas  Vergin  86K        new      0
4  2020-08-30        FordKansas  Vergin  86K  confirmed   2178
5  2020-08-30        FordKansas  Vergin  86K      death     10
6  2020-08-30  FresnoCalifornia  Vergin  354        new      0
7  2020-08-30  FresnoCalifornia  Vergin  354  confirmed  23932
8  2020-08-30  FresnoCalifornia  Vergin  354      death    239
9  2020-08-30    MorganColorado  Vergin  86R        new      1
10 2020-08-30    MorganColorado  Vergin  86R  confirmed    711
11 2020-08-30    MorganColorado  Vergin  86R      death     48
12 2020-08-30       ParmerTexas  Vergin  86E        new      1
13 2020-08-30       ParmerTexas  Vergin  86E  confirmed    381
14 2020-08-30       ParmerTexas  Vergin  86E      death      7

Example Plots

在此处输入图片说明

Plotly with Geodata

import pandas as pd
import plotly.express as px

# load and clean data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)  # drop this extra column
df.run_date = pd.to_datetime(df.run_date)  # set run_date to a datetime format

# convert to long form
dfl = df.set_index(['company', 'est', 'latitude', 'longitude'])[['confirmed', 'new', 'death']].stack().reset_index().rename(columns={'level_4': 'type', 0: 'vals'})

# plot
fig = px.scatter_geo(dfl,
                     lon='longitude',
                     lat='latitude',
                     color="type", # which column to use to set the color of markers
                     hover_name="company", # column added to hover information
                     size="vals", # size of markers
                     projection="albers usa")
fig.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM