[英]Pandas Cumulative Time Series Range in Data Frame
我希望有一个基于开始时间和结束列中的值的“扩展”日期范围。
如果记录的任何部分出现在先前的记录中,我想返回一个开始时间,该时间是两个开始时间记录中的最小值,一个结束时间是两个结束时间记录中的最大值。
这些将按订单ID分组
Order starttime endtime RollingStart RollingEnd
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485
因此,在上面的示例中,订单1的初始范围从2015-07-01 10:24:43.047到2015-07-01 10:24:57.257,然后从2015-07-01 10:24开始:57.465至2015-07-01 10:25:13.485
请注意,虽然开始时间是有序的,但结束时间不一定是由于数据的性质(有短期事件和长期事件)
最后,我只想要每个订单编号的最后一条记录,滚动开始组合(因此在这种情况下,最后两条记录
我试过了
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']), min(df['starttime'],df['RollingStart']),df['starttime'])
(这显然不包括订单ID)
但是我收到的错误是
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
任何想法将不胜感激
复制代码如下:
from io import StringIO
import io
text = """Order starttime endtime
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485"""
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['start']), min(df['starttime'],df['RollingStart']),df['starttime'])
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart']=df['starttime']
df['RollingEnd']=df['endtime']
df['RollingStart'] =
np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']),min(df['starttime'],df['RollingStart']),df['starttime'])
错误是:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
谢谢
似乎您正在尝试根据尚未设置的值返回一个值,
df['start'] =...conditions... df['start'].shift()
在我看来,您正在尝试为Pandas一无所知的列设置条件。
如果您只是尝试在这些列中将“开始”值设置为最新时间,请尝试使用或语句构建语句,或创建一个临时数组并使用max(如果您只是尝试获取最新时间)
df['start'] = np.where(max(df['enddatetime'],df['startdatetime'],))
如果上述方法无效,那么您是否具有重现此df的代码,以便可以查看是否出现相同的错误?
尝试这个:
版本1
NaT = pd.NaT
df['Rolling2'] = np.where(df['starttime'].shift(-1) > df['endtime'], NaT,'drop')
df['Rolling2'] = df['Rolling2'].shift(1)
df['RollingStart'] = np.where(df['Rolling2'] =='drop',None,df['starttime'])
df['RollingStart'] = pd.to_datetime(df['RollingStart']).ffill()
df['RollingEnd'] = df['endtime']
del df['Rolling2']
版本2。
df['RollingStart'] = df['starttime']
df['RollingEnd'] = df['endtime']
df['RollingStart'] = np.where(df['RollingEnd'].shift()>= df['starttime'] ,pd.NaT , df['RollingStart'])
df['RollingStart'] = pd.to_datetime(df['RollingStart']).ffill()
Order starttime endtime RollingStart RollingEnd
0 1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
2 1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
3 1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
4 1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
5 2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.