简体   繁体   English

如何从这个熊猫数据框中制作四重条形图?

[英]How to make quadruple bar graph from this pandas dataframe?

I am a new coder, and for my class we have an assignment where we are supposed to be making an API call to an outside dataset and then plotting something interesting about the data.我是一名新的编码员,对于我的班级,我们有一项作业,我们应该对外部数据集进行 API 调用,然后绘制有关数据的一些有趣内容。 I made my API call to a NYC tree census data set.我对纽约市树木普查数据集进行了 API 调用。 In the data, it shows both tree species, and health status (Good, Fair, Poor, Dead).在数据中,它显示了树种和健康状况(好、一般、差、死)。 I want to make a stacked bar plot showing the percentage of health status for each tree.我想制作一个堆积条形图,显示每棵树的健康状况百分比。 For example, I want one bar for Maple trees, showing that 56% are good, 26% are fair, 13% are poor, and 5% are dead.例如,我想要一个枫树条,显示 56% 是好的,26% 是一般的,13% 是差的,5% 是死的。 I'm not really sure how to accomplish all of this.我不确定如何完成所有这些。 Here is a screenshot showing how my dataset looks.这是一个屏幕截图,显示了我的数据集的外观。 Thanks for any advice!感谢您的任何建议!

数据框截图

  • I've used kaggle as source of data.我使用 kaggle 作为数据源。 I did find this as well API I did not use as it is so slow for me我也发现了这个API我没有使用,因为它对我来说太慢了
  • data I've used has no dead trees, just poor , fair and good as status我使用的数据没有树,只是状态不佳公平良好
  • I have used pandas-percentage-of-total-with-groupby technique for calculating percentages我使用了pandas-percentage-of-to-groupby技术来计算百分比
  • I prefer plotly to matplotlib for plotting.我更喜欢plotly 而不是matplotlib进行绘图。 Both are simple to use两者都易于使用
  • there really are too many bars for this to be a high quality visualisation真的有太多的酒吧,这是一个高质量的可视化

get data from API (kaggle)从 API 获取数据(kaggle)

import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile

# search for data set
# sys.argv = [sys.argv[0]] + "datasets list -s \"2015-street-tree-census-tree-data.csv\"".split(" ")
# kaggle.cli.main()

# download data set
sys.argv = [sys.argv[0]] + "datasets download new-york-city/ny-2015-street-tree-census-tree-data".split(" ")
kaggle.cli.main()

zfile = ZipFile("ny-2015-street-tree-census-tree-data.zip")
zfile.infolist()

# use CSV
df = pd.read_csv(zfile.open(zfile.infolist()[0]))

prepare data and plot using plotly使用plotly准备数据和绘图

import plotly.express as px

spc = 'spc_common'

# aggregate the data and shape it for plotting
dfa = (
    df.groupby([spc, "health"])
    .agg({"tree_id": "count"})
    .groupby(level=spc)
    .apply(lambda x: x / x.sum())
    .unstack("health")
    .droplevel(0, 1)
)

fig = px.bar(
    dfa.reset_index(),
    x=spc,
    y=["Poor", "Fair", "Good"],
    color_discrete_sequence=["red", "blue", "green"],
)
fig.update_layout(yaxis={"tickformat": "%"})

output输出

在此处输入图片说明

matplotlib matplotlib

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14, 3))
dfa.plot(kind="bar", stacked=True, ax=ax)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM