如何手动设置日期索引并为 python 熊猫 dataframe 中的缺失行填充零

Question

I have a dataset given below and I have a parameter that takes current date:我有一个下面给出的数据集，我有一个采用当前日期的参数：

product_name    serial_number     date           sum
"A"             "12"              "2020-01-01"   150        
"A"             "12"              "2020-01-02"   350
"A"             "12"              "2020-01-05"   550
"A"             "12"              "2020-01-10"   1500

As an example, please take the current_date as "2020-01-15".例如，请将 current_date 设为“2020-01-15”。 I am trying to set index manually from current_date, "2020-01-15" to min date in a given dataset ("2020-01-01") and output it as a dataframe that fills missing dates with zeros:我正在尝试将索引从 current_date、“2020-01-15”手动设置为给定数据集（“2020-01-01”）和 output 中的最小日期，它作为 dataframe 用零填充缺失的日期：

product_name    serial_number     date           sum
    "A"             "12"          "2020-01-01"   150        
    "A"             "12"          "2020-01-02"   350
    "A"             "12"          "2020-01-03"   0
    "A"             "12"          "2020-01-04"   0 
    "A"             "12"          "2020-01-05"   550
    "A"             "12"          "2020-01-06"   0        
    "A"             "12"          "2020-01-07"   0
    "A"             "12"          "2020-01-08"   0
    "A"             "12"          "2020-01-09"   0 
    "A"             "12"          "2020-01-10"   1500 
    "A"             "12"          "2020-01-11"   0        
    "A"             "12"          "2020-01-12"   0
    "A"             "12"          "2020-01-13"   0
    "A"             "12"          "2020-01-14"   0 
    "A"             "12"          "2020-01-15"   0

Answer 1

Use pivot with DataFrame.reindex by date_range and DataFrame.stack :使用pivot和DataFrame.reindex by date_range和DataFrame.stack ：

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date, name='date')
cols = ['product_name','serial_number']

df = (df.pivot(cols, 'date', 'sum')
        .reindex(r, axis=1, fill_value=0)
        .stack()
        .reset_index(name='sum'))
   product_name  serial_number       date   sum
0             A             12 2020-01-01   150
1             A             12 2020-01-02   350
2             A             12 2020-01-03     0
3             A             12 2020-01-04     0
4             A             12 2020-01-05   550
5             A             12 2020-01-06     0
6             A             12 2020-01-07     0
7             A             12 2020-01-08     0
8             A             12 2020-01-09     0
9             A             12 2020-01-10  1500
10            A             12 2020-01-11     0
11            A             12 2020-01-12     0
12            A             12 2020-01-13     0
13            A             12 2020-01-14     0
14            A             12 2020-01-15     0

Or DataFrame.set_index with DataFrame.reindex by MultiIndex.from_product :或DataFrame.set_index与DataFrame.reindex由MultiIndex.from_product ：

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
mux = pd.MultiIndex.from_product([df['product_name'].unique(),
                                  df['serial_number'].unique(),
                                  r], names=['product_name','serial_number','date'])
df = (df.set_index(['product_name','serial_number', 'date'])
        .reindex(mux, fill_value=0)
        .reset_index())

For more dynamic solution are set unique values in list comprehension:对于更动态的解决方案，在列表理解中设置唯一值：

current_date = '2020-01-15'
#if need dynamically set today
#current_date = pd.to_datetime('today')
r = pd.date_range(df['date'].min(), current_date)
cols = ['product_name','serial_number']

uniq = [df[x].unique() for x in cols]
mux = pd.MultiIndex.from_product(uniq+[r], names= cols + ['date'])
df = (df.set_index(['product_name','serial_number', 'date'])
        .reindex(mux, fill_value=0)
        .reset_index())

print (df)
 product_name  serial_number       date   sum
0             A             12 2020-01-01   150
1             A             12 2020-01-02   350
2             A             12 2020-01-03     0
3             A             12 2020-01-04     0
4             A             12 2020-01-05   550
5             A             12 2020-01-06     0
6             A             12 2020-01-07     0
7             A             12 2020-01-08     0
8             A             12 2020-01-09     0
9             A             12 2020-01-10  1500
10            A             12 2020-01-11     0
11            A             12 2020-01-12     0
12            A             12 2020-01-13     0
13            A             12 2020-01-14     0
14            A             12 2020-01-15     0

如何手动设置日期索引并为 python 熊猫 dataframe 中的缺失行填充零

问题描述

1 个解决方案

解决方案1
1 2021-01-13 05:25:39

如何手动设置日期索引并为 python 熊猫 dataframe 中的缺失行填充零

问题描述

1 个解决方案

解决方案1 1 2021-01-13 05:25:39

解决方案1
1 2021-01-13 05:25:39