简体   繁体   English

两列之间的日期范围

[英]Date_range between 2 columns

Im kinda new to Python and Datascience.我对 Python 和数据科学有点陌生。

I have a Dataset with 2 datetime columns A and B :我有一个包含 2 个日期时间列AB的数据集:

                     A                    B
0  2019-03-13 08:12:20  2019-03-13 08:12:25
1  2019-03-15 10:02:18  2019-03-13 10:02:20

For each row, i want to generate the date range in seconds between column A and column B, so as a result i should get this:对于每一行,我想在 A 列和 B 列之间生成以秒为单位的日期范围,因此我应该得到这个:

                    A
0 2019-03-13 08:12:20
1 2019-03-13 08:12:21
2 2019-03-13 08:12:22
3 2019-03-13 08:12:23
4 2019-03-13 08:12:24
5 2019-03-13 08:12:25

I made it work with this:我使它与这个一起工作:

import pandas as pd, numpy as np

df=pd.DataFrame({'A': ["2019-03-13 08:12:20", "2019-03-15 10:02:18"], 'B': ["2019-03-13 08:12:25", "2019-03-13 10:02:20"]})
l=[pd.date_range(start=df.iloc[i]['A'], end=df.iloc[i]['B'], freq='S') for i in range(len(df))]
df1=(pd.DataFrame(l).T)[0]
print(df1)

But as i have like 1M rows, it's taking too much time to run and i know that this solution isn't really the best, can you please guys show me whats the best way to do this?但是因为我有 1M 行,所以运行时间太长,而且我知道这个解决方案并不是最好的,你们能告诉我最好的方法是什么吗?

Here is necessary loop, one possible solution with list comprehension and flattening:这是必要的循环,一种可能的列表理解和展平解决方案:

l = [x for a, b in zip(df.A, df.B) for x in pd.date_range(a, b, freq='S')]
df1= pd.DataFrame({'A':l})
print(df1)
                    A
0 2019-03-13 08:12:20
1 2019-03-13 08:12:21
2 2019-03-13 08:12:22
3 2019-03-13 08:12:23
4 2019-03-13 08:12:24
5 2019-03-13 08:12:25

Another solution:另一种解决方案:

df1 = (pd.concat([pd.Series(pd.date_range(r.A, r.B, freq='S')) for r in df.itertuples()])
         .to_frame('A'))
print (df1)
                    A
0 2019-03-13 08:12:20
1 2019-03-13 08:12:21
2 2019-03-13 08:12:22
3 2019-03-13 08:12:23
4 2019-03-13 08:12:24
5 2019-03-13 08:12:25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM