简体   繁体   English

如何将组标签分配给具有特定时间间隔内的日期时间的 pandas df 行?

[英]How to assign group labels to pandas df rows that have a datetime within a specific interval?

I am trying to sort.txt files based on the time they have been created.我正在尝试根据创建时间对文件进行排序。 A set of 6-8.txt files is created multiple times a day within only a few minutes.每天在几分钟内多次创建一组 6-8.txt 文件。 I do not know the exact time intervals do I will have to find a way to automatically find the closest matching date-times (eg all that are less than 15 min apart).我不知道确切的时间间隔,我将不得不找到一种方法来自动找到最接近的匹配日期时间(例如,所有时间间隔小于 15 分钟)。 I have been able to extract the DateTime for each file.我已经能够为每个文件提取 DateTime。 Now, I would like to assign a group label that indicates which.txt files have been created in a set (ie within a few minutes apart from each other).现在,我想分配一个组 label 来指示已在一组中创建了哪些.txt 文件(即彼此相隔几分钟内)。

My current df looks like this:我当前的 df 看起来像这样:

index                         values
2020-09-06 17:25:14           97
2020-09-06 17:25:33            0
2020-09-06 17:27:00            3
2020-09-06 17:28:13            7
2020-09-06 17:29:28           10
2020-09-06 17:30:07           26
2020-09-06 17:30:40           34
2020-09-06 17:31:13           34
2020-09-06 18:07:34           99
2020-09-06 18:08:07            0
2020-09-06 18:08:35            3
2020-09-06 18:09:00            8
2020-09-06 18:09:24           11
2020-09-06 18:09:57           32
2020-09-06 18:10:24           43
2020-09-06 19:03:45           99
2020-09-06 19:04:31            0

I would like to automatically assign label "a" to all rows between 17:25 and 17:31, then label "b" to all rows between 18:07 and 18:10, then label "c" to all rows between 19:03 and 19:04.我想自动分配label“ a”,所有行之间的所有行17:25至17:31,然后ZD304BA20E96D87411588EEABAC850EEABAC850E34Z“ B”到所有ROWS,然后在18:07和18:07和18:04之间,然后03和19:04。

Most solutions I have found, only aggregate (pd.groupby(), pd.resample(), pd. grouper()).我发现的大多数解决方案,只有聚合(pd.groupby()、pd.resample()、pd.grouper())。 Can I use one of these methods to create my labels?我可以使用其中一种方法来创建我的标签吗?

I thought that this might be a useful start but as far as I understand the solution, it only creates a certain index of a specified index for me.我认为可能是一个有用的开始,但据我了解解决方案,它只为我创建指定索引的某个索引。

Thanks (I am happy to share an example.txt file and my code if this is possible here?)谢谢(如果可以的话,我很乐意在这里分享一个 example.txt 文件和我的代码?)

Create your conditions and choices then use df.between_time and np.select .创建您的条件和选择,然后使用df.between_timenp.select

cond = [df.index.isin(df.between_time('17:25', '17:31').index), 
        df.index.isin(df.between_time('18:07', '18:10').index),
        df.index.isin(df.between_time('19:03', '19:04').index)]

choices = ['a', 'b', 'c']

df['new_col'] = np.select(cond, choices, np.nan)

                     values new_col
index                              
2020-09-06 17:25:14      97       a
2020-09-06 17:25:33       0       a
2020-09-06 17:27:00       3       a
2020-09-06 17:28:13       7       a
2020-09-06 17:29:28      10       a
2020-09-06 17:30:07      26       a
2020-09-06 17:30:40      34       a
2020-09-06 17:31:13      34     nan
2020-09-06 18:07:34      99       b
2020-09-06 18:08:07       0       b
2020-09-06 18:08:35       3       b
2020-09-06 18:09:00       8       b
2020-09-06 18:09:24      11       b
2020-09-06 18:09:57      32       b
2020-09-06 18:10:24      43     nan
2020-09-06 19:03:45      99       c
2020-09-06 19:04:31       0     nan

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM