简体   繁体   中英

How to make annotated grouped stacked barchart in matplotlib?

I have covid19 tracking time series data which I scraped from covid19 tracking site. I want to make an annotated grouped stacked barchart. To do so, I used matplotlib and seaborn for making plot, I figured out plotting data to render the corresponding barchart. I tried plot annotation in SO but didn't get the correct annotated plot. Also, I have some issues of getting grouped stacked barchart for the time series data. Can anyone suggest a possible way of doing this? Any idea?

my attempt

here is the reproducible time series data that I scraped from covid19 tracking site:

import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns

bigdf = pd.read_csv("coviddf.csv")
bigdf['run_date'] = pd.to_datetime(bigdf['run_date'])

for g, d in bigdf.groupby(['company']):
    data = d.groupby(['run_date','county-state', 'company', 'est'], as_index=True).agg({'new': sum, 'confirmed': sum, 'death': sum}).stack().reset_index().rename(columns={'level_4': 'type', 0: 'val'})
    print(f'{g}')
    g = sns.FacetGrid(data, col='est', sharex=False, sharey=False, height=5, col_wrap=4)
    g.map(sns.barplot, 'run_date', 'val', 'type', order=data.run_date.dt.date.unique(), hue_order=data['type'].unique())
    g.add_legend()
    g.set_xticklabels(rotation=90)
    g.set(yscale='log')
    plt.tight_layout()
    plt.show()

I have a couple of issues from the above attempt. I need to make grouped stacked barchart where each group is each different company, and each stack barchart is individual establishment (aka, est column in coviddf.csv ), so each company might have multiple establishments, so I want to see the number of new, confirmed and death covid19 cases in grouped stacked barchart. Is there any way to make annotated grouped stacked barchart for this time series? Can anyone suggest a possible way of achieving this? How to make these plots in one page? Any idea?

desired output

I tried to make grouped stacked barchart like this post and second related post did. Here is the desired annotated grouped stacked barchart that I want to make:

在此处输入图像描述

Can anyone point me out how to make this happen from above current attempt? Any thoughts about this?

Grouped Bar Plot

  • This is not exactly what you've asked for, but I think it's a better option.
    • It's certainly an easier option.
    • The issue with the stacked bars is that confirmed is so large compared to the other values, that you will not be able to see new and death
  • I think the best option for this data is a horizontal bar plot with a group for each company & est .
import pandas as pd

# load the data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)  # drop this extra column

# select columns and shape the dataframe
dfs = df.iloc[:, [2, 3, 4, 12, 13]].set_index(['company', 'est']).sort_index(level=0)

# display(dfs)
                      confirmed  new  death
company        est                         
Agri  Co.      235        10853    0    237
CS  Packers    630        10930   77    118
Caviness       675          790    5     19
Central Valley 6063A       6021   44     72
FPL            332         5853   80    117

# plot
ax = dfs.plot.barh(figsize=(8, 25), width=0.8)
plt.xscale('log')
plt.grid(True)
plt.tick_params(labelbottom=True, labeltop=True)
plt.xlim(10**0, 1000000)

# annotate the bars
for rect in ax.patches:
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The width of the bar is the count value and can used as the label
    label_text = f'{width:.0f}'

    label_x = x + width
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if width > 0.001:
        ax.annotate(label_text, xy=(label_x, label_y), va='center', xytext=(2, -1), textcoords='offset points')

在此处输入图像描述

Stacked Bar Plot

  • new and death are barely visible compared to confirmed .
dfs.plot.barh(stacked=True, figsize=(8, 15))
plt.xscale('log')

在此处输入图像描述

I had trouble finding info on how to create a GROUPED and STACKED bar chart in matplotlib and later Plotly.

Here is my attempt at solving your problem (using Plotly):

# Import packages
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load data (I used the raw GitHub link so that no local file download was required)
bigdf = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")

# Get all companies names and number of companies
allComp = np.unique(bigdf.company)
numComp = allCompanies.shape[0]

# For all the companies
for i in range(numComp):
    # Grab company data and the names of the establishments for that company
    comp = allComp[i]
    compData = bigdf.loc[bigdf.company == comp]
    estabs = compData.est.to_numpy().astype(str)
    numEst = compData.shape[0]

    # Grab the new, confirmed, and death values for each of the establishments in that company
    newVals = []
    confirmedVals = []
    deathVals = []
    for i in range(numEst):
        estabData = compData.loc[compData.est == estabs[i]]
        newVals.append(estabData.new.to_numpy()[0])
        confirmedVals.append(estabData.confirmed.to_numpy()[0])
        deathVals.append(estabData.death.to_numpy()[0])

    # Load that data into a Plotly graph object
    fig = go.Figure(
        data=[
            go.Bar(name='New', x=estabs, y=newVals, yaxis='y', offsetgroup=1),
            go.Bar(name='Confirmed', x=estabs, y=confirmedVals, yaxis='y', offsetgroup=2),
            go.Bar(name='Death', x=estabs, y=deathVals, yaxis='y', offsetgroup=3)
        ]
    )

    # Update the layout (add time, set x/y axis titles, and bar graph mode)
    fig.update_layout(title='COVID Data for ' + comp, xaxis=dict(type='category'), xaxis_title='Establishment', 
                      yaxis_title='Value', barmode='stack')
    fig.show()

where the output is 16 separate Plotly graphs for each company (which are interactable, and you can turn on various traces, as scaling for new/confirmed/death values wasn't so easy). Each plot has all the establishments for that company in the x-axis, and the new/confirmed/death values for each establishment as a stacked bar chart.

Here is an example plot: HBS 公司 COVID 数据

I know this doesn't completely answer your question, but I hope you appreciate my effort:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM