簡體   English   中英

使用Python 3.x在熊貓中使用零和常量值擴展/填充時間序列數據

[英]Extend/Fill Time Series Data with zeros and constant values in Pandas with Python 3.x

我在擴展時間序列數據時遇到問題。 我有以下數據框:

date_first = df1['date'].min()  # is 2016-08-08
date_last = df1['date'].max()  # is 2016-08-20

>>> df1
         date         customer     qty
149481   2016-08-08   A            400
161933   2016-08-10   A            200
167172   2016-08-13   B            900
170296   2016-08-15   A            300
178221   2016-08-20   B            150

現在,我正在重新索引框架並獲取以下框架:

df1.set_index('date', inplace=True)

>>> df1
             customer     qty
date
2016-08-08   A            400
2016-08-10   A            200
2016-08-13   B            900
2016-08-15   A            300
2016-08-20   B            150

現在,我嘗試按最早的日期和最新的日期來擴展每個客戶的時間序列數據,如下所示:

ix = pd.DataFrame({on_column: pd.Series([date_first, date_last]), 'qty': 0})
result = df1.reindex(ix)

這並沒有給我我期望的結果,我希望它看起來像下面的框架:

    >>> df1
    date         customer     qty
0   2016-08-08   A            400
1   2016-08-08   B            0
2   2016-08-09   A            0
3   2016-08-09   B            0
4   2016-08-10   A            200
5   2016-08-10   B            0
...
24  2016-08-20   A            0
25  2016-08-20   B            150

使用MultiIndex.from_product通過set_index由兩列創建的原始MultiIndex reindex

date_first = df1['date'].min()  
date_last = df1['date'].max() 

mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'), 
                                  df1['customer'].unique()], names=['date','customer'])
print (mux)
result = df1.set_index(['date', 'customer']).reindex(mux, fill_value=0).reset_index()
print (result)
         date customer  qty
0  2016-08-08        A  400
1  2016-08-08        B    0
2  2016-08-09        A    0
3  2016-08-09        B    0
4  2016-08-10        A  200
5  2016-08-10        B    0
6  2016-08-11        A    0
7  2016-08-11        B    0
8  2016-08-12        A    0
9  2016-08-12        B    0
10 2016-08-13        A    0
11 2016-08-13        B  900
12 2016-08-14        A    0
13 2016-08-14        B    0
14 2016-08-15        A  300
15 2016-08-15        B    0
16 2016-08-16        A    0
17 2016-08-16        B    0
18 2016-08-17        A    0
19 2016-08-17        B    0
20 2016-08-18        A    0
21 2016-08-18        B    0
22 2016-08-19        A    0
23 2016-08-19        B    0
24 2016-08-20        A    0
25 2016-08-20        B  150

這是我包裝成函數的解決方案:

@staticmethod
def extend_time_series_data(data, date_column, customer_column, qty_column):
    data = data.reset_index(drop=True)
    date_first = data[date_column].min()
    date_last = data[date_column].max()
    data[date_column] = pd.to_datetime(data[date_column])
    data[qty_column] = pd.to_numeric(data[qty_column])

    mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'),
                                      data[customer_column].unique()], names=[date_column, customer_column])
    # print(mux)
    result = data.set_index([date_column, customer_column]).reindex(mux, fill_value=0).reset_index()
    # print(result)
    print('Extending time series data was successful!')
    return result

也許它將幫助某人擺脫類似的問題。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM