Pandas - 将每个 groupby 系列与 Date 列进行细分，其中每个组中的行数不同

Question

I have a csv file which I have read into a Pandas Dataframe.我有一个 csv 文件，我已读入 Pandas Dataframe。 The dataframe (say 'cdata') has the below columns dataframe（比如'cdata'）有以下几列

I want to be able to group this data by State and subplot the cumulative confirmed column data for each state in the same plot.我希望能够按 State 对这些数据进行分组，并将每个 state 的累积确认列数据绘制在同一 plot 中。 The data will be plotted against the Date column.数据将根据日期列绘制。

The distribution of data against the Date column is not uniform ie not all State will have data row for each Date.日期列的数据分布不均匀，即并非所有 State 都有每个日期的数据行。

When I try to subplot this using the below the plotted data does not look okay.当我尝试使用下面的绘图数据对此进行细分时，它看起来不太好。

fig,ax = plt.subplots(figsize=(8,6))
count=1;
for state,df in cdata.groupby('State'):
    if(count < 5):
        df.plot(x='Date', y='Confirmed', ax=ax, label=state)
        count = count +1

plt.legend()

This obviously does not look okay since if I look at the data the cumulative figure for State='Andhra Pradesh' on the 1st May is 1463 and not ~400 that the plotted graph seems to point.这显然看起来不太好，因为如果我查看数据，5 月 1 日 State='Andhra Pradesh' 的累积数字是 1463，而不是绘图似乎指向的 ~400。

What am I doing wrong here?我在这里做错了什么？

Answer 1

You are plotting the daily confirmed number and not the cumulative sum of confirmed.您正在绘制每日确认数，而不是确认的累计总和。 You can add a new column with the cumulative sum and plot it instead.您可以添加一个带有累积总和的新列，并改为 plot。

Also, be sure to set the 'Date' column as a date type and sort it before calculating the cumulative sum, you can do something like this:此外，请务必将“日期”列设置为日期类型并在计算累积总和之前对其进行排序，您可以执行以下操作：

## Transform 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])

## Sort the df by the 'Date' column
df.sort_values('Date', inplace=True)

## Calculate cumulative sum of 'Confirmed' by state
df['Total Confirmed'] = df.groupby('State')['Confirmed'].transform('cumsum');

## Plot
fig,ax = plt.subplots(figsize=(8,6))
count=1
for state, df in cdata.groupby('State'):
    if(count < 5):
        df.plot(x='Date', y='Total Confirmed', ax=ax, label=state)
        count = count + 1

plt.legend()

Answer 2

I was able to achieve the outcome I was looking for with the below code.我能够使用下面的代码实现我想要的结果。 However I am sure this is not the most elegant way of achieving the same and am still looking for alternatives that are much more intuitive.但是我确信这不是实现相同目标的最优雅方式，并且我仍在寻找更直观的替代方案。

grouped = cdata.groupby(['Date','State'],sort=False)['Confirmed'].sum().unstack('State')
grouped.reset_index(inplace=True)
columns = grouped.columns.to_list()[1:-1]
fig,ax = plt.subplots(figsize=(20,14))
grouped.plot(x='Date',y=columns, ax=ax)

Pandas - 将每个 groupby 系列与 Date 列进行细分，其中每个组中的行数不同

问题描述

2 个解决方案

解决方案1
0 2020-05-07 09:53:34

解决方案2
0 2020-05-08 07:23:05

Pandas - 将每个 groupby 系列与 Date 列进行细分，其中每个组中的行数不同

问题描述

2 个解决方案

解决方案1 0 2020-05-07 09:53:34

解决方案2 0 2020-05-08 07:23:05

解决方案1
0 2020-05-07 09:53:34

解决方案2
0 2020-05-08 07:23:05