简体   繁体   English

如何将参数与Pandas数据帧每一行中的单独列进行比较?

[英]How to compare an argument to separate columns in each row of a Pandas dataframe?

I have a DataFrame dwells with Event ID, Start Time, and End Time: 我有一个包含事件ID,开始时间和结束时间的DataFrame:

In []: dwells[['Event ID','Start Time','Stop Time']].head()
Out[]: 
    Event ID          Start Time           Stop Time
0  367067960 2016-09-01 00:05:00 2016-10-05 14:00:00
1  311288000 2016-09-01 00:05:00 2016-09-01 23:30:00
2  636016999 2016-09-01 00:05:00 2016-09-01 01:50:00
3  247304600 2016-09-01 01:20:00 2016-09-01 21:25:00
4  636016590 2016-09-01 06:55:00 2016-09-01 23:35:00

In []: dwells[['Event ID','Start Time','Stop Time']].dtypes
Out[]: 
Event ID               int64
Start Time    datetime64[ns]
Stop Time     datetime64[ns]
dtype: object

I'm trying to determine the number of events that were occurring at every increment of a DateTimeIndex with 5 minute frequency. 我正在尝试确定频率为5分钟的DateTimeIndex的每个增量发生的事件数。 Eventually I want to know the number of events and the cumulative time that the events were occurring (by multiplying the number of events and the time step and taking the cumsum): 最终,我想知道事件的数量和事件发生的累积时间(通过将事件的数量和时间步长相乘并求和):

In []: start = datetime(2016,9,1)
  ...: end = datetime(2016,12,31)
  ...: rng = pd.date_range(start, end, freq='5min')
  ...: rng[:5]

Out[]: 
DatetimeIndex(['2016-09-01 00:00:00', '2016-09-01 00:05:00',
               '2016-09-01 00:10:00', '2016-09-01 00:15:00',
               '2016-09-01 00:20:00'],
              dtype='datetime64[ns]', freq='5T')

I want to loop over the DateTimeIndex and compare each entry to the Start Time and Stop Time to see if it is between them, setting an appropriate variable in a new FLAG field. 我想遍历DateTimeIndex并将每个条目与“开始时间”和“停止时间”进行比较,以查看它们之间是否存在,在新的FLAG字段中设置适当的变量。 I can then sum the FLAG field and set it as the value of a series with rng as the index, like: 然后,我可以对FLAG字段求和并将其设置为以rng作为索引的系列的值,例如:

series = pd.Series(index=rng)
for x in rng:
    dwells['FLAG'] = dwells[['Start Time', 'Stop Time']].apply(lambda i,j: 1 if i.value <= x.value <= j.value else 0)
    series.loc[x] = dwells['FLAG'].sum()

That apply function doesn't work. 该套用功能无效。 I haven't been able to come up with a function that lets me check the x value against the time range in every row. 我还没有想出一个函数让我根据每一行的时间范围检查x值。

I'd appreciate help defining a function that gives me an output like: 我很乐意帮助您定义一个函数,使我得到类似以下的输出:

In []: series[:5]
Out[]:
2016-09-01 00:00:00   37
2016-09-01 00:05:00   39
2016-09-01 00:10:00   40
2016-09-01 00:15:00   39
2016-09-01 00:20:00   35

If there's a more efficient approach to solving this problem I'd appreciate that as well. 如果有解决此问题的更有效方法,我也将不胜感激。

I found a good starting point at this post: python pandas: apply a function with arguments to a series 我在这篇文章中找到了一个很好的起点: python pandas:将带有参数的函数应用于系列

Which led me to the documentation on defining a custom function with keyword arguments applied to series, here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply 这使我想到了有关在系列中应用关键字参数来定义自定义函数的文档, 网址为: http : //pandas.pydata.org/pandas-docs/stable/genic/pandas.Series.apply.html#pandas.Series。应用

Since a DF row is a series, I wrote the following: 由于DF行是一个系列,因此我写了以下内容:

def flag_events(row, **kwargs):
    '''Applied row-wise to a DF, checks if kwargs['t_step'] is between 'Start Time' and 'Stop Time', returning 1 if yes and 0 if no'''
    if row['Start Time'].value <= kwargs['t_step'] <= row['Stop Time'].value:
         return 1
    else:
        return 0

DwellTable = pd.DataFrame(index=rng)

DwellTable['VesselCount'] = DwellTable.index.map(lambda x: dwells.apply(flag_events, t_step=x.value, axis=1).sum())

DwellTable['DwellMin'] = DwellTable['EventCount']*5
DwellTable['DwellMinCum'] = DwellTable['DwellMin'].cumsum()

That worked, but it takes a long time to run. 那行得通,但是需要很长时间才能运行。 I'd still appreciate suggestions for a more efficient approach. 我仍然感谢您提出更有效的方法的建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何比较数据框熊猫中的列? - how to compare columns in dataframe pandas? 比较每行的数据框列中的元素 - Python - Compare elements in dataframe columns for each row - Python Pandas Dataframe:如何比较一行的两列中的值是否等于后续行的同一列中的值? - Pandas Dataframe: how can i compare values in two columns of a row are equal to the ones in the same columns of a subsequent row? 为pandas数据帧中的每一行组合多个列 - Combine multiple columns for each row in pandas dataframe 熊猫:如何比较DataFrame中的列表列表与Pandas(不是循环)? - Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)? 如何将数据帧的每一行与以下两行进行比较,并基于这三行和一种算法来修改当前行? (熊猫) - How to compare each row of a dataframe to the following 2 rows, and modify the current row based on these 3 rows and an algorithm? (Pandas) 熊猫将每一行与参考行进行比较-仅某些列 - Pandas compare each row to reference row - certain columns only 将 dataframe 的每一行分开 - separate each row of a dataframe Python,Pandas,如何基于列分隔数据框 - Python, Pandas, How to separate dataframe based on columns 如何将&#39;number&#39;拆分为pandas DataFrame中的单独列 - how to split 'number' to separate columns in pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM