简体   繁体   English

如何手动设置日期索引并为 python 熊猫 dataframe 中的缺失行填充零

[英]How to set date index manually and fill zeros for missing rows in python panda dataframe

I have a dataset given below and I have a parameter that takes current date:我有一个下面给出的数据集,我有一个采用当前日期的参数:

product_name    serial_number     date           sum
"A"             "12"              "2020-01-01"   150        
"A"             "12"              "2020-01-02"   350
"A"             "12"              "2020-01-05"   550
"A"             "12"              "2020-01-10"   1500

As an example, please take the current_date as "2020-01-15".例如,请将 current_date 设为“2020-01-15”。 I am trying to set index manually from current_date, "2020-01-15" to min date in a given dataset ("2020-01-01") and output it as a dataframe that fills missing dates with zeros:我正在尝试将索引从 current_date、“2020-01-15”手动设置为给定数据集(“2020-01-01”)和 output 中的最小日期,它作为 dataframe 用零填充缺失的日期:

product_name    serial_number     date           sum
    "A"             "12"          "2020-01-01"   150        
    "A"             "12"          "2020-01-02"   350
    "A"             "12"          "2020-01-03"   0
    "A"             "12"          "2020-01-04"   0 
    "A"             "12"          "2020-01-05"   550
    "A"             "12"          "2020-01-06"   0        
    "A"             "12"          "2020-01-07"   0
    "A"             "12"          "2020-01-08"   0
    "A"             "12"          "2020-01-09"   0 
    "A"             "12"          "2020-01-10"   1500 
    "A"             "12"          "2020-01-11"   0        
    "A"             "12"          "2020-01-12"   0
    "A"             "12"          "2020-01-13"   0
    "A"             "12"          "2020-01-14"   0 
    "A"             "12"          "2020-01-15"   0 

Use pivot with DataFrame.reindex by date_range and DataFrame.stack :使用pivotDataFrame.reindex by date_rangeDataFrame.stack

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date, name='date')
cols = ['product_name','serial_number']

df = (df.pivot(cols, 'date', 'sum')
        .reindex(r, axis=1, fill_value=0)
        .stack()
        .reset_index(name='sum'))
   product_name  serial_number       date   sum
0             A             12 2020-01-01   150
1             A             12 2020-01-02   350
2             A             12 2020-01-03     0
3             A             12 2020-01-04     0
4             A             12 2020-01-05   550
5             A             12 2020-01-06     0
6             A             12 2020-01-07     0
7             A             12 2020-01-08     0
8             A             12 2020-01-09     0
9             A             12 2020-01-10  1500
10            A             12 2020-01-11     0
11            A             12 2020-01-12     0
12            A             12 2020-01-13     0
13            A             12 2020-01-14     0
14            A             12 2020-01-15     0

Or DataFrame.set_index with DataFrame.reindex by MultiIndex.from_product :DataFrame.set_indexDataFrame.reindexMultiIndex.from_product

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
mux = pd.MultiIndex.from_product([df['product_name'].unique(),
                                  df['serial_number'].unique(),
                                  r], names=['product_name','serial_number','date'])
df = (df.set_index(['product_name','serial_number', 'date'])
        .reindex(mux, fill_value=0)
        .reset_index())

For more dynamic solution are set unique values in list comprehension:对于更动态的解决方案,在列表理解中设置唯一值:

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
cols = ['product_name','serial_number']

uniq = [df[x].unique() for x in cols]
mux = pd.MultiIndex.from_product(uniq+[r], names= cols + ['date'])
df = (df.set_index(['product_name','serial_number', 'date'])
        .reindex(mux, fill_value=0)
        .reset_index())

print (df)
 product_name  serial_number       date   sum
0             A             12 2020-01-01   150
1             A             12 2020-01-02   350
2             A             12 2020-01-03     0
3             A             12 2020-01-04     0
4             A             12 2020-01-05   550
5             A             12 2020-01-06     0
6             A             12 2020-01-07     0
7             A             12 2020-01-08     0
8             A             12 2020-01-09     0
9             A             12 2020-01-10  1500
10            A             12 2020-01-11     0
11            A             12 2020-01-12     0
12            A             12 2020-01-13     0
13            A             12 2020-01-14     0
14            A             12 2020-01-15     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM