简体   繁体   中英

Python Pandas totals and dates

Im sorry for not posting the data but it wouldn't really help. The thing is a need to make a graph and I have a csv file full of information organised by date. It has 'Cases' 'Deaths' 'Recoveries' 'Critical' 'Hospitalized' 'States' as categories. It goes in order by date and has the amount of cases,deaths,recoveries per day of each state. How do I sum this categories to make a graph that shows how the total is increasing? I really have no idea how to start so I can't post my data. Below are some numbers that try to explain what I have.

 0 2020-02-20 1 Andalucía NaN NaN NaN 1 2020-02-20 2 Aragón NaN NaN NaN 2 2020-02-20 3 Asturias NaN NaN NaN 3 2020-02-20 4 Baleares 1.0 NaN NaN 4 2020-02-20 5 Canarias 1.0 NaN NaN.. ... ... ... ... ... ... 888 2020-04-06 19 Melilla 92.0 40.0 3.0 889 2020-04-06 14 Murcia 1283.0 500.0 84.0 890 2020-04-06 15 Navarra 3355.0 1488.0 124.0 891 2020-04-06 16 País Vasco 9021.0 4856.0 417.0 892 2020-04-06 17 La Rioja 2846.0 918.0 66.0

It's unclear exactly what you mean by "sum this categories". I'm assuming you mean that for each date, you want to sum the values across all different regions to come up with the total values for Spain?

In which case, you will want to groupby date, then .sum() the columns (you can drop the States category.

grouped_df = df.groupby("date")["Cases", "Deaths", ...].sum()
grouped_df.set_index("date").plot()

This snippet will probably not work directly, you may need to reformat the dates etc. But should be enough to get you started.

I think you are looking for groupby followed by a cumsum not including dates.

columns_to_group = ['Cases', 'Deaths', 
                  'Recoveries', 'Critical', 'Hospitalized', 'date']
new_columns = ['Cases_sum', 'Deaths_sum', 
               'Recoveries_sum', 'Critical_sum', 'Hospitalized_sum']

df_grouped = df[columns_to_group].groupby('date').sum().reset_index()

For plotting seaborn provides an easy functions:

import seaborn as sns

df_melted = df_grouped.melt(id_vars=["date"])
sns.lineplot(data=df_melted, x='date', y = 'value', hue='variable')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM