[英]groupby timeseries fill missing data with 0
Given a panda timeseries dataframe grouped by 'UUT' 给定一个按“ UUT”分组的熊猫时间序列数据框
df
Out[64]:
UUT Sum
Date_Time
2017-04-28 18:48:16 uut-01 2
2017-04-28 18:48:18 uut-02 2
2017-04-28 18:48:19 uut-03 2
I want to use reindex to create a time series in 1 second interval, and fill in the gaps with 0 value for column Sum only, similar to what's showing below: 我想使用reindex在1秒的间隔内创建一个时间序列,并仅对Sum列用0值填充空白,类似于以下所示:
df
Out[64]:
UUT Sum
Date_Time
2017-04-28 18:48:16 uut-01 2
2017-04-28 18:48:16 uut-02 0
2017-04-28 18:48:16 uut-03 0
2017-04-28 18:48:17 uut-01 2
2017-04-28 18:48:17 uut-02 0
2017-04-28 18:48:17 uut-03 0
2017-04-28 18:48:18 uut-01 0
2017-04-28 18:48:18 uut-02 2
2017-04-28 18:48:18 uut-03 0
2017-04-28 18:48:19 uut-01 0
2017-04-28 18:48:19 uut-02 0
2017-04-28 18:48:19 uut-03 2
I used reindex, but it filled both 'UUT' and 'Sum' with zeros. 我使用了reindex,但是它用零填充了'UUT'和'Sum'。 How do I fill the missing time stamp for UUT column with uut names instead of zeros, and fill zeros to 'Sum' column only? 如何用uut名称而不是零填充UUT列的缺失时间戳,并仅将零填充到“求和”列?
idx = pd.date_range('2017-04-28 18:48:16', '2017-04-28 18:48:19', freq='1s')
grouped = df.groupby('UUT')
grouped.get_group('uut-01').reindex(idx, fill_value=0)
grouped.get_group('uut-01')
2017-04-28 18:48:16 uut-01 2
2017-04-28 18:48:17 0 0
2017-04-28 18:48:18 0 0
2017-04-28 18:48:19 0 0
Based on Kyle's answer, I got it to work finally: 根据Kyle的回答,我终于开始工作了:
df = df.set_index([df.index, 'UUT'])
idx = pd.MultiIndex.from_product(df.index.levels, names=['Date_Time', 'UUT'])
df = df.reindex(index=idx, fill_value=0)
df.reset_index(level=[1]) #convert back to single index
grouped = df.groupby('UUT')
df = df.set_index(['time', 'uut'])
idx = pd.MultiIndex.from_product([df.index, df.uut])
df.reindex(index=idx, fill_value=0)
sum
18:48:16 uut-01 2
uut-02 0
uut-03 0
18:48:18 uut-01 0
uut-02 2
uut-03 0
18:48:19 uut-01 0
uut-02 0
uut-03 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.