如何使用 pandas 中的时区将 unix 纪元时间转换为日期时间

Question

I have many csv files containing Unix epoch time which needs to be converted to human readable date/time.我有许多 csv 文件，其中包含 Unix 纪元时间，需要将其转换为人类可读的日期/时间。 The following Python code does the job but it is very slow.以下 Python 代码可以完成这项工作，但速度很慢。

df['dt'] = pd.to_datetime(df['epoch'], unit='s')
df['dt'] = df.apply(lambda x: x['dt'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)

Actually, the second line is the bottleneck (~30 seconds for 1 million rows).实际上，第二行是瓶颈（100 万行约 30 秒）。 So even with the aid of multiprocessing, it is not scalable as I have more than a billion records totally.因此，即使借助多处理，它也无法扩展，因为我总共拥有超过 10 亿条记录。 How can I make it faster?我怎样才能让它更快？

Answer 1

The question pertains to pandas , the pure python version is Converting unix timestamp string to readable date该问题与pandas ，纯 python 版本是将 unix 时间戳字符串转换为可读日期
pandas.Series.dt.tz_localize & pandas.Series.dt.tz_convert are both vectorized functions, which don't require using .apply() . pandas.Series.dt.tz_localize & pandas.Series.dt.tz_convert都是矢量化函数，不需要使用.apply() 。
- The vectorized implementation is 8159 times faster than .apply() .矢量化实现比.apply()快 8159 倍。
- The .dt accessor must be used.必须使用.dt访问器。
It may be better to use pd.to_datetime(df['DT'], unit='s', utc=True) and remove .dt.tz_localize('UTC') .最好使用pd.to_datetime(df['DT'], unit='s', utc=True)并删除.dt.tz_localize('UTC') 。

import pandas as pd

# test dataframe with 1M rows
df = pd.DataFrame({'DT': [1349720105, 1349806505, 1349892905, 1349979305, 1350065705]})
df['DT'] = pd.to_datetime(df['DT'], unit='s')
df = pd.concat([df]*200000).reset_index(drop=True)

# display(df.head()
                 DT
2012-10-08 18:15:05
2012-10-09 18:15:05
2012-10-10 18:15:05
2012-10-11 18:15:05
2012-10-12 18:15:05

# convert the column
df['DT'] = df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')

# display(df.head())
                       DT
2012-10-08 20:15:05+02:00
2012-10-09 20:15:05+02:00
2012-10-10 20:15:05+02:00
2012-10-11 20:15:05+02:00
2012-10-12 20:15:05+02:00

print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
 #   Column  Non-Null Count    Dtype                           
---  ------  --------------    -----                           
 0   DT      1000000 non-null  datetime64[ns, Europe/Amsterdam]
dtypes: datetime64[ns, Europe/Amsterdam](1)
memory usage: 7.6 MB

Alternative选择

This option is more concise and localizes to 'UTC' when converting to a datetime dtype with pandas.to_datetime() .当使用dtype pandas.to_datetime()转换为datetime时间 dtype 时，此选项更简洁并本地化为'UTC' 。

df['DT'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')

The most time consuming aspect of the original implementation from the OP was x['dt'].tz_localize('UTC') within the .apply() OP 的原始实现中最耗时的方面是 .apply() 中的x['dt'].tz_localize('UTC') .apply()
The following code runs in about the same amount of time, within a few milliseconds.以下代码在几毫秒内运行的时间大致相同。

df['DT_1'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
df['DT_2'] = pd.to_datetime(df['DT'], unit='s', utc=True).apply(lambda x: x.tz_convert('Europe/Amsterdam'))

`%%timeit` Testing `%%timeit`测试

1M rows 1M 行
This tests the comparable vectorized version, against the version with .apply() from the OP, where 'DT' has already been converted to a datetime dtype .这将测试可比较的矢量化版本，与来自 OP 的.apply()版本进行对比，其中'DT'已经转换为datetime dtype 。

%%timeit
df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
[out]:
4.4 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.apply(lambda x: x['DT'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
[out]:
35.9 s ± 572 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

如何使用 pandas 中的时区将 unix 纪元时间转换为日期时间

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-01-29 05:04:10

Alternative选择

`%%timeit` Testing `%%timeit`测试

如何使用 pandas 中的时区将 unix 纪元时间转换为日期时间

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-01-29 05:04:10

Alternative选择

%%timeit Testing %%timeit测试

解决方案1
4 已采纳 2021-01-29 05:04:10

`%%timeit` Testing `%%timeit`测试