簡體   English   中英

Pandas / Python - 按時間段分組數據

[英]Pandas/Python - Group data by same period in time

我有一些財務數據,並希望只獲得特定時間段(小時,天,月......)的最后一筆交易。

例:

>>df
      time  price_BRL     qt              time_dt
1312001297      23.49   1.00  2011-07-30 04:48:17
1312049148      23.40   1.00  2011-07-30 18:05:48
1312121523      23.49   2.00  2011-07-31 14:12:03
1312121523      23.50   6.50  2011-07-31 14:12:03
1312177622      23.40   2.00  2011-08-01 05:47:02
1312206416      23.25   1.00  2011-08-01 13:46:56
1312637929      18.95   1.50  2011-08-06 13:38:49
1312637929      18.95   4.00  2011-08-06 13:38:49
1312817114       0.80   0.01  2011-08-08 15:25:14
1312818289       0.10   0.01  2011-08-08 15:44:49
1312819795       6.00   0.09  2011-08-08 16:09:55
1312847064      16.00   0.86  2011-08-08 23:44:24
1312849282      16.00   6.14  2011-08-09 00:21:22
1312898146      19.90   1.00  2011-08-09 13:55:46
1312915666       6.00   0.01  2011-08-09 18:47:46
1312934897      19.90   1.00  2011-08-10 00:08:17
>>filter_by_last_day(df)
      time  price_BRL     qt              time_dt
1312049148      23.40   1.00  2011-07-30 18:05:48
1312121523      23.50   6.50  2011-07-31 14:12:03
1312206416      23.25   1.00  2011-08-01 13:46:56
1312637929      18.95   4.00  2011-08-06 13:38:49
1312847064      16.00   0.86  2011-08-08 23:44:24
1312915666       6.00   0.01  2011-08-09 18:47:46
1312934897      19.90   1.00  2011-08-10 00:08:17

我正在考慮使用groupby()並獲得當天的mean()這個解決方案也可以解決我的問題,但不完全正確)但不知道如何選擇df.groupby('time.day').last()這樣的df.groupby('time.day').last()

你可以使用dt.date groupbylast聚合:

#if necessery convert to datetime
df.time_dt = pd.to_datetime(df.time_dt)

df = df.groupby(df.time_dt.dt.date).last().reset_index(drop=True)
print (df)
         time  price_BRL    qt             time_dt
0  1312049148      23.40  1.00 2011-07-30 18:05:48
1  1312121523      23.50  6.50 2011-07-31 14:12:03
2  1312206416      23.25  1.00 2011-08-01 13:46:56
3  1312637929      18.95  4.00 2011-08-06 13:38:49
4  1312847064      16.00  0.86 2011-08-08 23:44:24
5  1312915666       6.00  0.01 2011-08-09 18:47:46
6  1312934897      19.90  1.00 2011-08-10 00:08:17

感謝MaxU提供另一種解決方案 - 為返回DataFrame添加參數as_index=False

df = df.groupby(df.time_dt.dt.date, as_index=False).last()
print (df)
         time  price_BRL    qt             time_dt
0  1312049148      23.40  1.00 2011-07-30 18:05:48
1  1312121523      23.50  6.50 2011-07-31 14:12:03
2  1312206416      23.25  1.00 2011-08-01 13:46:56
3  1312637929      18.95  4.00 2011-08-06 13:38:49
4  1312847064      16.00  0.86 2011-08-08 23:44:24
5  1312915666       6.00  0.01 2011-08-09 18:47:46
6  1312934897      19.90  1.00 2011-08-10 00:08:17

resample解決方案,但必須通過dropna刪除NaN行:

df = df.resample('d', on='time_dt').last().dropna(how='all').reset_index(drop=True)
#cast column time to int
df.time = df.time.astype(int)
print (df)
         time  price_BRL    qt             time_dt
0  1312049148      23.40  1.00 2011-07-30 18:05:48
1  1312121523      23.50  6.50 2011-07-31 14:12:03
2  1312206416      23.25  1.00 2011-08-01 13:46:56
3  1312637929      18.95  4.00 2011-08-06 13:38:49
4  1312847064      16.00  0.86 2011-08-08 23:44:24
5  1312915666       6.00  0.01 2011-08-09 18:47:46
6  1312934897      19.90  1.00 2011-08-10 00:08:17

---

你也可以使用dt.month

df = df.groupby(df.time_dt.dt.month).last().reset_index(drop=True)
print (df)
         time  price_BRL   qt             time_dt
0  1312121523       23.5  6.5 2011-07-31 14:12:03
1  1312934897       19.9  1.0 2011-08-10 00:08:17

hours它有點復雜,如果需要groupbydatehours一起,解決方案是用astypeminutesseconds替換為0

hours = df.time_dt.values.astype('<M8[h]')
print (hours)
['2011-07-30T04' '2011-07-30T18' '2011-07-31T14' '2011-07-31T14'
 '2011-08-01T05' '2011-08-01T13' '2011-08-06T13' '2011-08-06T13'
 '2011-08-08T15' '2011-08-08T15' '2011-08-08T16' '2011-08-08T23'
 '2011-08-09T00' '2011-08-09T13' '2011-08-09T18' '2011-08-10T00']

df = df.groupby(hours).last().reset_index(drop=True)
print (df)
          time  price_BRL    qt             time_dt
0   1312001297      23.49  1.00 2011-07-30 04:48:17
1   1312049148      23.40  1.00 2011-07-30 18:05:48
2   1312121523      23.50  6.50 2011-07-31 14:12:03
3   1312177622      23.40  2.00 2011-08-01 05:47:02
4   1312206416      23.25  1.00 2011-08-01 13:46:56
5   1312637929      18.95  4.00 2011-08-06 13:38:49
6   1312818289       0.10  0.01 2011-08-08 15:44:49
7   1312819795       6.00  0.09 2011-08-08 16:09:55
8   1312847064      16.00  0.86 2011-08-08 23:44:24
9   1312849282      16.00  6.14 2011-08-09 00:21:22
10  1312898146      19.90  1.00 2011-08-09 13:55:46
11  1312915666       6.00  0.01 2011-08-09 18:47:46
12  1312934897      19.90  1.00 2011-08-10 00:08:17

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM