如何将组标签分配给具有特定时间间隔内的日期时间的 pandas df 行？

Question

I am trying to sort.txt files based on the time they have been created.我正在尝试根据创建时间对文件进行排序。 A set of 6-8.txt files is created multiple times a day within only a few minutes.每天在几分钟内多次创建一组 6-8.txt 文件。 I do not know the exact time intervals do I will have to find a way to automatically find the closest matching date-times (eg all that are less than 15 min apart).我不知道确切的时间间隔，我将不得不找到一种方法来自动找到最接近的匹配日期时间（例如，所有时间间隔小于 15 分钟）。 I have been able to extract the DateTime for each file.我已经能够为每个文件提取 DateTime。 Now, I would like to assign a group label that indicates which.txt files have been created in a set (ie within a few minutes apart from each other).现在，我想分配一个组 label 来指示已在一组中创建了哪些.txt 文件（即彼此相隔几分钟内）。

My current df looks like this:我当前的 df 看起来像这样：

index                         values
2020-09-06 17:25:14           97
2020-09-06 17:25:33            0
2020-09-06 17:27:00            3
2020-09-06 17:28:13            7
2020-09-06 17:29:28           10
2020-09-06 17:30:07           26
2020-09-06 17:30:40           34
2020-09-06 17:31:13           34
2020-09-06 18:07:34           99
2020-09-06 18:08:07            0
2020-09-06 18:08:35            3
2020-09-06 18:09:00            8
2020-09-06 18:09:24           11
2020-09-06 18:09:57           32
2020-09-06 18:10:24           43
2020-09-06 19:03:45           99
2020-09-06 19:04:31            0

I would like to automatically assign label "a" to all rows between 17:25 and 17:31, then label "b" to all rows between 18:07 and 18:10, then label "c" to all rows between 19:03 and 19:04.我想自动分配label“ a”，所有行之间的所有行17:25至17:31，然后ZD304BA20E96D87411588EEABAC850EEABAC850E34Z“ B”到所有ROWS，然后在18:07和18:07和18:04之间，然后03和19:04。

Most solutions I have found, only aggregate (pd.groupby(), pd.resample(), pd. grouper()).我发现的大多数解决方案，只有聚合（pd.groupby()、pd.resample()、pd.grouper()）。 Can I use one of these methods to create my labels?我可以使用其中一种方法来创建我的标签吗？

I thought that this might be a useful start but as far as I understand the solution, it only creates a certain index of a specified index for me.我认为这可能是一个有用的开始，但据我了解解决方案，它只为我创建指定索引的某个索引。

Thanks (I am happy to share an example.txt file and my code if this is possible here?)谢谢（如果可以的话，我很乐意在这里分享一个 example.txt 文件和我的代码？）

Answer 1

Create your conditions and choices then use df.between_time and np.select .创建您的条件和选择，然后使用df.between_time和np.select 。

cond = [df.index.isin(df.between_time('17:25', '17:31').index), 
        df.index.isin(df.between_time('18:07', '18:10').index),
        df.index.isin(df.between_time('19:03', '19:04').index)]

choices = ['a', 'b', 'c']

df['new_col'] = np.select(cond, choices, np.nan)

                     values new_col
index                              
2020-09-06 17:25:14      97       a
2020-09-06 17:25:33       0       a
2020-09-06 17:27:00       3       a
2020-09-06 17:28:13       7       a
2020-09-06 17:29:28      10       a
2020-09-06 17:30:07      26       a
2020-09-06 17:30:40      34       a
2020-09-06 17:31:13      34     nan
2020-09-06 18:07:34      99       b
2020-09-06 18:08:07       0       b
2020-09-06 18:08:35       3       b
2020-09-06 18:09:00       8       b
2020-09-06 18:09:24      11       b
2020-09-06 18:09:57      32       b
2020-09-06 18:10:24      43     nan
2020-09-06 19:03:45      99       c
2020-09-06 19:04:31       0     nan

如何将组标签分配给具有特定时间间隔内的日期时间的 pandas df 行？

问题描述

1 个解决方案

解决方案1
1 2021-04-06 14:00:12

如何将组标签分配给具有特定时间间隔内的日期时间的 pandas df 行？

问题描述

1 个解决方案

解决方案1 1 2021-04-06 14:00:12

解决方案1
1 2021-04-06 14:00:12