[英]How to compare an argument to separate columns in each row of a Pandas dataframe?
I have a DataFrame dwells with Event ID, Start Time, and End Time: 我有一个包含事件ID,开始时间和结束时间的DataFrame:
In []: dwells[['Event ID','Start Time','Stop Time']].head()
Out[]:
Event ID Start Time Stop Time
0 367067960 2016-09-01 00:05:00 2016-10-05 14:00:00
1 311288000 2016-09-01 00:05:00 2016-09-01 23:30:00
2 636016999 2016-09-01 00:05:00 2016-09-01 01:50:00
3 247304600 2016-09-01 01:20:00 2016-09-01 21:25:00
4 636016590 2016-09-01 06:55:00 2016-09-01 23:35:00
In []: dwells[['Event ID','Start Time','Stop Time']].dtypes
Out[]:
Event ID int64
Start Time datetime64[ns]
Stop Time datetime64[ns]
dtype: object
I'm trying to determine the number of events that were occurring at every increment of a DateTimeIndex with 5 minute frequency. 我正在尝试确定频率为5分钟的DateTimeIndex的每个增量发生的事件数。 Eventually I want to know the number of events and the cumulative time that the events were occurring (by multiplying the number of events and the time step and taking the cumsum):
最终,我想知道事件的数量和事件发生的累积时间(通过将事件的数量和时间步长相乘并求和):
In []: start = datetime(2016,9,1)
...: end = datetime(2016,12,31)
...: rng = pd.date_range(start, end, freq='5min')
...: rng[:5]
Out[]:
DatetimeIndex(['2016-09-01 00:00:00', '2016-09-01 00:05:00',
'2016-09-01 00:10:00', '2016-09-01 00:15:00',
'2016-09-01 00:20:00'],
dtype='datetime64[ns]', freq='5T')
I want to loop over the DateTimeIndex and compare each entry to the Start Time and Stop Time to see if it is between them, setting an appropriate variable in a new FLAG field. 我想遍历DateTimeIndex并将每个条目与“开始时间”和“停止时间”进行比较,以查看它们之间是否存在,在新的FLAG字段中设置适当的变量。 I can then sum the FLAG field and set it as the value of a series with rng as the index, like:
然后,我可以对FLAG字段求和并将其设置为以rng作为索引的系列的值,例如:
series = pd.Series(index=rng)
for x in rng:
dwells['FLAG'] = dwells[['Start Time', 'Stop Time']].apply(lambda i,j: 1 if i.value <= x.value <= j.value else 0)
series.loc[x] = dwells['FLAG'].sum()
That apply function doesn't work. 该套用功能无效。 I haven't been able to come up with a function that lets me check the x value against the time range in every row.
我还没有想出一个函数让我根据每一行的时间范围检查x值。
I'd appreciate help defining a function that gives me an output like: 我很乐意帮助您定义一个函数,使我得到类似以下的输出:
In []: series[:5]
Out[]:
2016-09-01 00:00:00 37
2016-09-01 00:05:00 39
2016-09-01 00:10:00 40
2016-09-01 00:15:00 39
2016-09-01 00:20:00 35
If there's a more efficient approach to solving this problem I'd appreciate that as well. 如果有解决此问题的更有效方法,我也将不胜感激。
I found a good starting point at this post: python pandas: apply a function with arguments to a series 我在这篇文章中找到了一个很好的起点: python pandas:将带有参数的函数应用于系列
Which led me to the documentation on defining a custom function with keyword arguments applied to series, here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply 这使我想到了有关在系列中应用关键字参数来定义自定义函数的文档, 网址为: http : //pandas.pydata.org/pandas-docs/stable/genic/pandas.Series.apply.html#pandas.Series。应用
Since a DF row is a series, I wrote the following: 由于DF行是一个系列,因此我写了以下内容:
def flag_events(row, **kwargs):
'''Applied row-wise to a DF, checks if kwargs['t_step'] is between 'Start Time' and 'Stop Time', returning 1 if yes and 0 if no'''
if row['Start Time'].value <= kwargs['t_step'] <= row['Stop Time'].value:
return 1
else:
return 0
DwellTable = pd.DataFrame(index=rng)
DwellTable['VesselCount'] = DwellTable.index.map(lambda x: dwells.apply(flag_events, t_step=x.value, axis=1).sum())
DwellTable['DwellMin'] = DwellTable['EventCount']*5
DwellTable['DwellMinCum'] = DwellTable['DwellMin'].cumsum()
That worked, but it takes a long time to run. 那行得通,但是需要很长时间才能运行。 I'd still appreciate suggestions for a more efficient approach.
我仍然感谢您提出更有效的方法的建议。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.