简体   繁体   English

如何在 matplotlib 中制作带注释的分组堆叠条形图?

[英]How to make annotated grouped stacked barchart in matplotlib?

I have covid19 tracking time series data which I scraped from covid19 tracking site.我有从 covid19 跟踪站点上抓取的 covid19 跟踪时间序列数据。 I want to make an annotated grouped stacked barchart.我想制作一个带注释的分组堆叠条形图。 To do so, I used matplotlib and seaborn for making plot, I figured out plotting data to render the corresponding barchart.为此,我使用matplotlibseaborn制作 plot,我想出了绘制数据以呈现相应的条形图。 I tried plot annotation in SO but didn't get the correct annotated plot. Also, I have some issues of getting grouped stacked barchart for the time series data.我在SO中尝试了 plot 注释,但没有得到正确的注释 plot。另外,我在为时间序列数据获取分组堆叠条形图时遇到了一些问题。 Can anyone suggest a possible way of doing this?任何人都可以建议这样做的可能方法吗? Any idea?任何的想法?

my attempt我的尝试

here is the reproducible time series data that I scraped from covid19 tracking site:这是我从covid19跟踪网站上抓取的可重现时间序列数据

import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns

bigdf = pd.read_csv("coviddf.csv")
bigdf['run_date'] = pd.to_datetime(bigdf['run_date'])

for g, d in bigdf.groupby(['company']):
    data = d.groupby(['run_date','county-state', 'company', 'est'], as_index=True).agg({'new': sum, 'confirmed': sum, 'death': sum}).stack().reset_index().rename(columns={'level_4': 'type', 0: 'val'})
    print(f'{g}')
    g = sns.FacetGrid(data, col='est', sharex=False, sharey=False, height=5, col_wrap=4)
    g.map(sns.barplot, 'run_date', 'val', 'type', order=data.run_date.dt.date.unique(), hue_order=data['type'].unique())
    g.add_legend()
    g.set_xticklabels(rotation=90)
    g.set(yscale='log')
    plt.tight_layout()
    plt.show()

I have a couple of issues from the above attempt.我从上面的尝试中遇到了几个问题。 I need to make grouped stacked barchart where each group is each different company, and each stack barchart is individual establishment (aka, est column in coviddf.csv ), so each company might have multiple establishments, so I want to see the number of new, confirmed and death covid19 cases in grouped stacked barchart.我需要制作分组堆叠条形图,其中每个组都是不同的公司,每个堆叠条形图都是单独的机构(也就是coviddf.csvest列),所以每个公司可能有多个机构,所以我想看看新的数量, 确诊和死亡 covid19 病例在分组堆叠条形图中。 Is there any way to make annotated grouped stacked barchart for this time series?有没有办法为这个时间序列制作带注释的分组堆叠条形图? Can anyone suggest a possible way of achieving this?任何人都可以提出实现这一目标的可能方法吗? How to make these plots in one page?如何在一页中制作这些图? Any idea?任何的想法?

desired output希望 output

I tried to make grouped stacked barchart like this post and second related post did.我试着像这篇文章第二篇相关文章那样制作分组堆叠条形图。 Here is the desired annotated grouped stacked barchart that I want to make:这是我想要制作的所需带注释的分组堆叠条形图:

在此处输入图像描述

Can anyone point me out how to make this happen from above current attempt?谁能指出我如何从当前的尝试中实现这一点? Any thoughts about this?对此有什么想法吗?

Grouped Bar Plot分组酒吧 Plot

  • This is not exactly what you've asked for, but I think it's a better option.这不完全是您所要求的,但我认为这是一个更好的选择。
    • It's certainly an easier option.这当然是一个更容易的选择。
    • The issue with the stacked bars is that confirmed is so large compared to the other values, that you will not be able to see new and death堆叠条的问题是confirmed与其他值相比太大,您将无法看到newdeath
  • I think the best option for this data is a horizontal bar plot with a group for each company & est .我认为此数据的最佳选择是水平条 plot,每个companyest都有一个组。
import pandas as pd

# load the data
df = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")
df.drop(columns=['Unnamed: 0'], inplace=True)  # drop this extra column

# select columns and shape the dataframe
dfs = df.iloc[:, [2, 3, 4, 12, 13]].set_index(['company', 'est']).sort_index(level=0)

# display(dfs)
                      confirmed  new  death
company        est                         
Agri  Co.      235        10853    0    237
CS  Packers    630        10930   77    118
Caviness       675          790    5     19
Central Valley 6063A       6021   44     72
FPL            332         5853   80    117

# plot
ax = dfs.plot.barh(figsize=(8, 25), width=0.8)
plt.xscale('log')
plt.grid(True)
plt.tick_params(labelbottom=True, labeltop=True)
plt.xlim(10**0, 1000000)

# annotate the bars
for rect in ax.patches:
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The width of the bar is the count value and can used as the label
    label_text = f'{width:.0f}'

    label_x = x + width
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if width > 0.001:
        ax.annotate(label_text, xy=(label_x, label_y), va='center', xytext=(2, -1), textcoords='offset points')

在此处输入图像描述

Stacked Bar Plot堆积条 Plot

  • new and death are barely visible compared to confirmed . newdeathconfirmed相比几乎看不出来。
dfs.plot.barh(stacked=True, figsize=(8, 15))
plt.xscale('log')

在此处输入图像描述

I had trouble finding info on how to create a GROUPED and STACKED bar chart in matplotlib and later Plotly.我在 matplotlib 和后来的 Plotly 中找不到有关如何创建 GROUPED 和 STACKED 条形图的信息时遇到了麻烦。

Here is my attempt at solving your problem (using Plotly):这是我尝试解决您的问题(使用 Plotly):

# Import packages
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load data (I used the raw GitHub link so that no local file download was required)
bigdf = pd.read_csv("https://gist.githubusercontent.com/jerry-shad/318595505684ea4248a6cc0949788d33/raw/31bbeb08f329b4b96605b8f2a48f6c74c3e0b594/coviddf.csv")

# Get all companies names and number of companies
allComp = np.unique(bigdf.company)
numComp = allCompanies.shape[0]

# For all the companies
for i in range(numComp):
    # Grab company data and the names of the establishments for that company
    comp = allComp[i]
    compData = bigdf.loc[bigdf.company == comp]
    estabs = compData.est.to_numpy().astype(str)
    numEst = compData.shape[0]

    # Grab the new, confirmed, and death values for each of the establishments in that company
    newVals = []
    confirmedVals = []
    deathVals = []
    for i in range(numEst):
        estabData = compData.loc[compData.est == estabs[i]]
        newVals.append(estabData.new.to_numpy()[0])
        confirmedVals.append(estabData.confirmed.to_numpy()[0])
        deathVals.append(estabData.death.to_numpy()[0])

    # Load that data into a Plotly graph object
    fig = go.Figure(
        data=[
            go.Bar(name='New', x=estabs, y=newVals, yaxis='y', offsetgroup=1),
            go.Bar(name='Confirmed', x=estabs, y=confirmedVals, yaxis='y', offsetgroup=2),
            go.Bar(name='Death', x=estabs, y=deathVals, yaxis='y', offsetgroup=3)
        ]
    )

    # Update the layout (add time, set x/y axis titles, and bar graph mode)
    fig.update_layout(title='COVID Data for ' + comp, xaxis=dict(type='category'), xaxis_title='Establishment', 
                      yaxis_title='Value', barmode='stack')
    fig.show()

where the output is 16 separate Plotly graphs for each company (which are interactable, and you can turn on various traces, as scaling for new/confirmed/death values wasn't so easy).其中output 是每个公司的 16 个单独的 Plotly 图表(它们是可交互的,您可以打开各种轨迹,因为新/确认/死亡值的缩放并不那么容易)。 Each plot has all the establishments for that company in the x-axis, and the new/confirmed/death values for each establishment as a stacked bar chart.每个 plot 在 x 轴上都有该公司的所有机构,每个机构的新/确认/死亡值作为堆叠条形图。

Here is an example plot:这是一个示例 plot: HBS 公司 COVID 数据

I know this doesn't completely answer your question, but I hope you appreciate my effort:)我知道这并不能完全回答你的问题,但我希望你欣赏我的努力:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM