[英]How to set date index manually and fill zeros for missing rows in python panda dataframe
I have a dataset given below and I have a parameter that takes current date:我有一个下面给出的数据集,我有一个采用当前日期的参数:
product_name serial_number date sum
"A" "12" "2020-01-01" 150
"A" "12" "2020-01-02" 350
"A" "12" "2020-01-05" 550
"A" "12" "2020-01-10" 1500
As an example, please take the current_date as "2020-01-15".例如,请将 current_date 设为“2020-01-15”。 I am trying to set index manually from current_date, "2020-01-15" to min date in a given dataset ("2020-01-01") and output it as a dataframe that fills missing dates with zeros:我正在尝试将索引从 current_date、“2020-01-15”手动设置为给定数据集(“2020-01-01”)和 output 中的最小日期,它作为 dataframe 用零填充缺失的日期:
product_name serial_number date sum
"A" "12" "2020-01-01" 150
"A" "12" "2020-01-02" 350
"A" "12" "2020-01-03" 0
"A" "12" "2020-01-04" 0
"A" "12" "2020-01-05" 550
"A" "12" "2020-01-06" 0
"A" "12" "2020-01-07" 0
"A" "12" "2020-01-08" 0
"A" "12" "2020-01-09" 0
"A" "12" "2020-01-10" 1500
"A" "12" "2020-01-11" 0
"A" "12" "2020-01-12" 0
"A" "12" "2020-01-13" 0
"A" "12" "2020-01-14" 0
"A" "12" "2020-01-15" 0
Use pivot
with DataFrame.reindex
by date_range
and DataFrame.stack
:使用pivot
和DataFrame.reindex
by date_range
和DataFrame.stack
:
current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date, name='date')
cols = ['product_name','serial_number']
df = (df.pivot(cols, 'date', 'sum')
.reindex(r, axis=1, fill_value=0)
.stack()
.reset_index(name='sum'))
product_name serial_number date sum
0 A 12 2020-01-01 150
1 A 12 2020-01-02 350
2 A 12 2020-01-03 0
3 A 12 2020-01-04 0
4 A 12 2020-01-05 550
5 A 12 2020-01-06 0
6 A 12 2020-01-07 0
7 A 12 2020-01-08 0
8 A 12 2020-01-09 0
9 A 12 2020-01-10 1500
10 A 12 2020-01-11 0
11 A 12 2020-01-12 0
12 A 12 2020-01-13 0
13 A 12 2020-01-14 0
14 A 12 2020-01-15 0
Or DataFrame.set_index
with DataFrame.reindex
by MultiIndex.from_product
:或DataFrame.set_index
与DataFrame.reindex
由MultiIndex.from_product
:
current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
mux = pd.MultiIndex.from_product([df['product_name'].unique(),
df['serial_number'].unique(),
r], names=['product_name','serial_number','date'])
df = (df.set_index(['product_name','serial_number', 'date'])
.reindex(mux, fill_value=0)
.reset_index())
For more dynamic solution are set unique values in list comprehension:对于更动态的解决方案,在列表理解中设置唯一值:
current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
cols = ['product_name','serial_number']
uniq = [df[x].unique() for x in cols]
mux = pd.MultiIndex.from_product(uniq+[r], names= cols + ['date'])
df = (df.set_index(['product_name','serial_number', 'date'])
.reindex(mux, fill_value=0)
.reset_index())
print (df)
product_name serial_number date sum
0 A 12 2020-01-01 150
1 A 12 2020-01-02 350
2 A 12 2020-01-03 0
3 A 12 2020-01-04 0
4 A 12 2020-01-05 550
5 A 12 2020-01-06 0
6 A 12 2020-01-07 0
7 A 12 2020-01-08 0
8 A 12 2020-01-09 0
9 A 12 2020-01-10 1500
10 A 12 2020-01-11 0
11 A 12 2020-01-12 0
12 A 12 2020-01-13 0
13 A 12 2020-01-14 0
14 A 12 2020-01-15 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.