[英]how to compare two values in series, not the series objects? Python 3.x
[英]Extend/Fill Time Series Data with zeros and constant values in Pandas with Python 3.x
我在擴展時間序列數據時遇到問題。 我有以下數據框:
date_first = df1['date'].min() # is 2016-08-08
date_last = df1['date'].max() # is 2016-08-20
>>> df1
date customer qty
149481 2016-08-08 A 400
161933 2016-08-10 A 200
167172 2016-08-13 B 900
170296 2016-08-15 A 300
178221 2016-08-20 B 150
現在,我正在重新索引框架並獲取以下框架:
df1.set_index('date', inplace=True)
>>> df1
customer qty
date
2016-08-08 A 400
2016-08-10 A 200
2016-08-13 B 900
2016-08-15 A 300
2016-08-20 B 150
現在,我嘗試按最早的日期和最新的日期來擴展每個客戶的時間序列數據,如下所示:
ix = pd.DataFrame({on_column: pd.Series([date_first, date_last]), 'qty': 0})
result = df1.reindex(ix)
這並沒有給我我期望的結果,我希望它看起來像下面的框架:
>>> df1
date customer qty
0 2016-08-08 A 400
1 2016-08-08 B 0
2 2016-08-09 A 0
3 2016-08-09 B 0
4 2016-08-10 A 200
5 2016-08-10 B 0
...
24 2016-08-20 A 0
25 2016-08-20 B 150
使用MultiIndex.from_product
通過set_index
由兩列創建的原始MultiIndex
reindex
:
date_first = df1['date'].min()
date_last = df1['date'].max()
mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'),
df1['customer'].unique()], names=['date','customer'])
print (mux)
result = df1.set_index(['date', 'customer']).reindex(mux, fill_value=0).reset_index()
print (result)
date customer qty
0 2016-08-08 A 400
1 2016-08-08 B 0
2 2016-08-09 A 0
3 2016-08-09 B 0
4 2016-08-10 A 200
5 2016-08-10 B 0
6 2016-08-11 A 0
7 2016-08-11 B 0
8 2016-08-12 A 0
9 2016-08-12 B 0
10 2016-08-13 A 0
11 2016-08-13 B 900
12 2016-08-14 A 0
13 2016-08-14 B 0
14 2016-08-15 A 300
15 2016-08-15 B 0
16 2016-08-16 A 0
17 2016-08-16 B 0
18 2016-08-17 A 0
19 2016-08-17 B 0
20 2016-08-18 A 0
21 2016-08-18 B 0
22 2016-08-19 A 0
23 2016-08-19 B 0
24 2016-08-20 A 0
25 2016-08-20 B 150
這是我包裝成函數的解決方案:
@staticmethod
def extend_time_series_data(data, date_column, customer_column, qty_column):
data = data.reset_index(drop=True)
date_first = data[date_column].min()
date_last = data[date_column].max()
data[date_column] = pd.to_datetime(data[date_column])
data[qty_column] = pd.to_numeric(data[qty_column])
mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'),
data[customer_column].unique()], names=[date_column, customer_column])
# print(mux)
result = data.set_index([date_column, customer_column]).reindex(mux, fill_value=0).reset_index()
# print(result)
print('Extending time series data was successful!')
return result
也許它將幫助某人擺脫類似的問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.