如何使用具有多个索引的 pandas reindex 填充空值？

Question

I am trying to generate a set of summary statistics for each day in a dataset.我正在尝试为数据集中的每一天生成一组汇总统计信息。 Specifically, I want to know the percentage of time spent within, above, and below a certain range of values.具体来说，我想知道在某个值范围内、高于和低于某个值范围内花费的时间百分比。

Starting example df:开始示例df：

date                   value
2022-05-01 17:03:45    120
2022-05-02 17:08:45    55
2022-05-03 17:13:45    230
2022-05-04 17:18:45    285
2022-05-05 17:23:45    140

I then make a new column with the following conditions:然后我创建一个具有以下条件的新列：

df['range'] = ['extreme low' if bgl <= 54 else 'low' if bgl < 70 else 'extreme high' if bgl > 250 else 'high' if bgl >= 180 else 'in range' for bgl in df['bgl']]
df.head()

date                   value    range
2022-05-01 17:03:45    120      in range
2022-05-02 17:08:45    55       low
2022-05-03 17:13:45    230      high
2022-05-04 17:18:45    285      extreme high
2022-05-05 17:23:45    41       extreme low

The issue:问题：

There are some days where, for example, there are no values in the extreme low category.例如，在某些日子里，极低类别中没有值。 Even if this is true, I would still like to see extreme low: 0 in my summary statistics.即使这是真的，我仍然希望在我的汇总统计中看到extreme low: 0 。

When I group by date and use value_counts() and reindex() , my results are very close to what I want.当我按日期分组并使用value_counts()和reindex()时，我的结果非常接近我想要的。 However, even with fill_value=0 , I don't get a row with "0":但是，即使使用fill_value=0 ，我也没有得到带有“0”的行：

categories = ['extreme low', 'low', 'in range', 'high', 'extreme high']

daily_summaries = df.groupby(pd.Grouper(key='date', axis=0, freq='D'))['range'].value_counts(normalize=True).reindex(categories, level=1, fill_value=0).mul(100).round(1)
print(daily_summaries)

Resulting in:导致：

date        range       
2022-05-02  low              2.7
            in range        77.8
            high            13.6
            extreme high     5.9

My desired output is this:我想要的输出是这样的：

date        range       
2022-05-02  extreme low        0
            low              2.7
            in range        77.8
            high            13.6
            extreme high     5.9

I hope that makes sense.我希望这是有道理的。 Any help or advice would be greatly appreciated.任何帮助或建议将不胜感激。 I'm sure I'm missing something rather simple, but I can't seem to figure it out.我确定我错过了一些相当简单的东西，但我似乎无法弄清楚。 Thank you so much in advance!非常感谢您！

Answer 1

In your 1st you can do cut在你的第一个你可以做cut

df['range'] = pd.cut(df.value,
                     bins = [0,54,70,180,250,np.inf],
                     labels = ['extreme low','low','in range','high','extreme high'])

For the 2nd第 2 次

out = pd.crosstab(df['date'].dt.date, df['range']).reindex(categories, axis=1,fill_value=0).stack()

Answer 2

Creating the Dataframe:创建数据框：

df = pd.DataFrame({
    'date':['2022-05-01', '2022-05-01', '2022-05-02', '2022-05-02', '2022-05-03'],
    'value': [120, 55, 230, 285, 41]
})
df.date = pd.to_datetime(df.date)

Categorizing range:分类范围：

df['range'] = 'extreme low'
df['range'] = np.where((df.value>54) & (df.value<70),'low',df['range'])
df['range'] = np.where((df.value>=70) & (df.value<180),'in range',df['range'])
df['range'] = np.where((df.value>=180) & (df.value<250),'high',df['range'])
df['range'] = np.where((df.value>=250),'extreme high',df['range'])

Next, We first group by & for a new table:接下来，我们首先按 & 分组一个新表：

count_df = df.groupby(['date','range']).size().reset_index(name='counts')

You can finally pivot it to get counts of 0 items:您最终可以将其旋转以获取 0 个项目的计数：

pd.pivot_table(count_df,
                   index=['date','range'],
                   values='counts',                            
                   fill_value = 0,
                   dropna=False,
                   aggfunc=np.sum)

Output:输出：

如何使用具有多个索引的 pandas reindex 填充空值？

问题描述

2 个解决方案

解决方案1
4 已采纳 2022-05-25 21:14:55

解决方案2
1 2022-05-25 21:58:50

如何使用具有多个索引的 pandas reindex 填充空值？

问题描述

2 个解决方案

解决方案1 4 已采纳 2022-05-25 21:14:55

解决方案2 1 2022-05-25 21:58:50

解决方案1
4 已采纳 2022-05-25 21:14:55

解决方案2
1 2022-05-25 21:58:50