简体   繁体   English

Pandas - 将每个 groupby 系列与 Date 列进行细分,其中每个组中的行数不同

[英]Pandas - Subplotting each groupby series against Date column where count of rows in each group is different

I have a csv file which I have read into a Pandas Dataframe.我有一个 csv 文件,我已读入 Pandas Dataframe。 The dataframe (say 'cdata') has the below columns dataframe(比如'cdata')有以下几列

数据框提取

I want to be able to group this data by State and subplot the cumulative confirmed column data for each state in the same plot.我希望能够按 State 对这些数据进行分组,并将每个 state 的累积确认列数据绘制在同一 plot 中。 The data will be plotted against the Date column.数据将根据日期列绘制。

The distribution of data against the Date column is not uniform ie not all State will have data row for each Date.日期列的数据分布不均匀,即并非所有 State 都有每个日期的数据行。

按日期分组

When I try to subplot this using the below the plotted data does not look okay.当我尝试使用下面的绘图数据对此进行细分时,它看起来不太好。

fig,ax = plt.subplots(figsize=(8,6))
count=1;
for state,df in cdata.groupby('State'):
    if(count < 5):
        df.plot(x='Date', y='Confirmed', ax=ax, label=state)
        count = count +1

plt.legend()

结果

This obviously does not look okay since if I look at the data the cumulative figure for State='Andhra Pradesh' on the 1st May is 1463 and not ~400 that the plotted graph seems to point.这显然看起来不太好,因为如果我查看数据,5 月 1 日 State='Andhra Pradesh' 的累积数字是 1463,而不是绘图似乎指向的 ~400。

What am I doing wrong here?我在这里做错了什么?

You are plotting the daily confirmed number and not the cumulative sum of confirmed.您正在绘制每日确认数,而不是确认的累计总和。 You can add a new column with the cumulative sum and plot it instead.您可以添加一个带有累积总和的新列,并改为 plot。

Also, be sure to set the 'Date' column as a date type and sort it before calculating the cumulative sum, you can do something like this:此外,请务必将“日期”列设置为日期类型并在计算累积总和之前对其进行排序,您可以执行以下操作:

## Transform 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])

## Sort the df by the 'Date' column
df.sort_values('Date', inplace=True)

## Calculate cumulative sum of 'Confirmed' by state
df['Total Confirmed'] = df.groupby('State')['Confirmed'].transform('cumsum');

## Plot
fig,ax = plt.subplots(figsize=(8,6))
count=1
for state, df in cdata.groupby('State'):
    if(count < 5):
        df.plot(x='Date', y='Total Confirmed', ax=ax, label=state)
        count = count + 1

plt.legend()

I was able to achieve the outcome I was looking for with the below code.我能够使用下面的代码实现我想要的结果。 However I am sure this is not the most elegant way of achieving the same and am still looking for alternatives that are much more intuitive.但是我确信这不是实现相同目标的最优雅方式,并且我仍在寻找更直观的替代方案。

grouped = cdata.groupby(['Date','State'],sort=False)['Confirmed'].sum().unstack('State')
grouped.reset_index(inplace=True)
columns = grouped.columns.to_list()[1:-1]
fig,ax = plt.subplots(figsize=(20,14))
grouped.plot(x='Date',y=columns, ax=ax)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM