簡體   English   中英

數據框中的熊貓累積時間序列范圍

[英]Pandas Cumulative Time Series Range in Data Frame

我希望有一個基於開始時間和結束列中的值的“擴展”日期范圍。

如果記錄的任何部分出現在先前的記錄中,我想返回一個開始時間,該時間是兩個開始時間記錄中的最小值,一個結束時間是兩個結束時間記錄中的最大值。

這些將按訂單ID分組

Order starttime             endtime                 RollingStart            RollingEnd
1   2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1   2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
1   2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
1   2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1   2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
2   2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485

因此,在上面的示例中,訂單1的初始范圍從2015-07-01 10:24:43.047到2015-07-01 10:24:57.257,然后從2015-07-01 10:24開始:57.465至2015-07-01 10:25:13.485

請注意,雖然開始時間是有序的,但結束時間不一定是由於數據的性質(有短期事件和長期事件)

最后,我只想要每個訂單編號的最后一條記錄,滾動開始組合(因此在這種情況下,最后兩條記錄

我試過了

df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']), min(df['starttime'],df['RollingStart']),df['starttime'])

(這顯然不包括訂單ID)

但是我收到的錯誤是

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

任何想法將不勝感激

復制代碼如下:

from io import StringIO
import io

text = """Order   starttime               endtime
1       2015-07-01 10:24:43.047  2015-07-01 10:24:43.150
1       2015-07-01 10:24:43.137  2015-07-01 10:24:43.200
1       2015-07-01 10:24:43.197  2015-07-01 10:24:57.257
1       2015-07-01 10:24:57.465  2015-07-01 10:25:13.470
1       2015-07-01 10:24:57.730  2015-07-01 10:25:13.485
2       2015-07-01 10:48:57.465  2015-07-01 10:48:13.485"""

df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['start']), min(df['starttime'],df['RollingStart']),df['starttime'])




df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])


df['RollingStart']=df['starttime']
df['RollingEnd']=df['endtime']
df['RollingStart'] = 
np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']),min(df['starttime'],df['RollingStart']),df['starttime'])

錯誤是:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 731, in     __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

謝謝

似乎您正在嘗試根據尚未設置的值返回一個值,

df['start'] =...conditions... df['start'].shift()

在我看來,您正在嘗試為Pandas一無所知的列設置條件。

如果您只是嘗試在這些列中將“開始”值設置為最新時間,請嘗試使用或語句構建語句,或創建一個臨時數組並使用max(如果您只是嘗試獲取最新時間)

df['start'] = np.where(max(df['enddatetime'],df['startdatetime'],))

如果上述方法無效,那么您是否具有重現此df的代碼,以便可以查看是否出現相同的錯誤?

嘗試這個:

版本1

NaT = pd.NaT
df['Rolling2']     = np.where(df['starttime'].shift(-1) > df['endtime'], NaT,'drop')
df['Rolling2']     = df['Rolling2'].shift(1)
df['RollingStart'] = np.where(df['Rolling2']  =='drop',None,df['starttime'])
df['RollingStart'] = pd.to_datetime(df['RollingStart']).ffill()
df['RollingEnd']   = df['endtime']
del df['Rolling2']

版本2。

df['RollingStart'] = df['starttime']
df['RollingEnd']   = df['endtime']
df['RollingStart'] = np.where(df['RollingEnd'].shift()>= df['starttime'] ,pd.NaT , df['RollingStart'])
df['RollingStart'] = pd.to_datetime(df['RollingStart']).ffill()


  Order               starttime                 endtime            RollingStart              RollingEnd
0      1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1      1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
2      1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
3      1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
4      1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
5      2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM