Python Dataframe 从几百万行的大日期时间索引中提取唯一日期列表

Question

My data frame has around 17 million rows.我的数据框有大约 1700 万行。 The index is DateTime.索引是日期时间。 It is around one-second resolution one-year data.它是大约一秒分辨率的一年数据。 Now I want to extract a list of unique dates from it.现在我想从中提取一个唯一日期列表。

My code:我的代码：

# sample df

df.index = DatetimeIndex(['2019-10-01 05:00:00', '2019-10-01 05:00:01',
               '2019-10-01 05:00:05', '2019-10-01 05:00:06',
               '2019-10-01 05:00:08', '2019-10-01 05:00:09',
               '2019-10-01 05:00:12', '2019-10-01 05:00:13',
               '2019-10-01 05:00:15', '2019-10-01 05:00:17',
               ...
               '2020-11-14 19:59:21', '2020-11-14 19:59:23',
               '2020-11-14 19:59:31', '2020-11-14 19:59:32',
               '2020-11-14 19:59:37', '2020-11-14 19:59:38',
               '2020-11-14 19:59:45', '2020-11-14 19:59:46',
               '2020-11-14 19:59:55', '2020-11-14 19:59:56'],
              dtype='datetime64[ns]', name='timestamp', length=17796121, freq=None)
dates = df.index.strftime('&Y-&m-%d').unique()

My above code gave the output.我上面的代码给出了输出。 But it took around five minutes.但大约花了五分钟。 Is there any better way by which I can get the dates much faster?有没有更好的方法可以让我更快地获得日期？

Answer 1

Save stftime for when you actually need the strings.保存stftime以备您真正需要这些字符串时使用。 It's pretty slow.这很慢。

Try this:尝试这个：

dates = np.unique(dates.date)

Python Dataframe 从几百万行的大日期时间索引中提取唯一日期列表

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-11-21 05:02:57

Python Dataframe 从几百万行的大日期时间索引中提取唯一日期列表

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-11-21 05:02:57

解决方案1
3 已采纳 2020-11-21 05:02:57