[英]How to convert unix epoch time to datetime with timezone in pandas
I have many csv files containing Unix epoch time which needs to be converted to human readable date/time.我有许多 csv 文件,其中包含 Unix 纪元时间,需要将其转换为人类可读的日期/时间。 The following Python code does the job but it is very slow.
以下 Python 代码可以完成这项工作,但速度很慢。
df['dt'] = pd.to_datetime(df['epoch'], unit='s')
df['dt'] = df.apply(lambda x: x['dt'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
Actually, the second line is the bottleneck (~30 seconds for 1 million rows).实际上,第二行是瓶颈(100 万行约 30 秒)。 So even with the aid of multiprocessing, it is not scalable as I have more than a billion records totally.
因此,即使借助多处理,它也无法扩展,因为我总共拥有超过 10 亿条记录。 How can I make it faster?
我怎样才能让它更快?
pandas
, the pure python version is Converting unix timestamp string to readable datepandas
,纯 python 版本是将 unix 时间戳字符串转换为可读日期pandas.Series.dt.tz_localize
& pandas.Series.dt.tz_convert
are both vectorized functions, which don't require using .apply()
. pandas.Series.dt.tz_localize
& pandas.Series.dt.tz_convert
都是矢量化函数,不需要使用.apply()
。
.apply()
..apply()
快 8159 倍。.dt
accessor must be used..dt
访问器。pd.to_datetime(df['DT'], unit='s', utc=True)
and remove .dt.tz_localize('UTC')
.pd.to_datetime(df['DT'], unit='s', utc=True)
并删除.dt.tz_localize('UTC')
。import pandas as pd
# test dataframe with 1M rows
df = pd.DataFrame({'DT': [1349720105, 1349806505, 1349892905, 1349979305, 1350065705]})
df['DT'] = pd.to_datetime(df['DT'], unit='s')
df = pd.concat([df]*200000).reset_index(drop=True)
# display(df.head()
DT
2012-10-08 18:15:05
2012-10-09 18:15:05
2012-10-10 18:15:05
2012-10-11 18:15:05
2012-10-12 18:15:05
# convert the column
df['DT'] = df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
# display(df.head())
DT
2012-10-08 20:15:05+02:00
2012-10-09 20:15:05+02:00
2012-10-10 20:15:05+02:00
2012-10-11 20:15:05+02:00
2012-10-12 20:15:05+02:00
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DT 1000000 non-null datetime64[ns, Europe/Amsterdam]
dtypes: datetime64[ns, Europe/Amsterdam](1)
memory usage: 7.6 MB
'UTC'
when converting to a datetime
dtype
with pandas.to_datetime()
.dtype
pandas.to_datetime()
转换为datetime
时间 dtype 时,此选项更简洁并本地化为'UTC'
。df['DT'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
x['dt'].tz_localize('UTC')
within the .apply()
x['dt'].tz_localize('UTC')
.apply()
df['DT_1'] = pd.to_datetime(df['DT'], unit='s', utc=True).dt.tz_convert('Europe/Amsterdam')
df['DT_2'] = pd.to_datetime(df['DT'], unit='s', utc=True).apply(lambda x: x.tz_convert('Europe/Amsterdam'))
%%timeit
Testing %%timeit
测试.apply()
from the OP, where 'DT'
has already been converted to a datetime
dtype
..apply()
版本进行对比,其中'DT'
已经转换为datetime
dtype
。%%timeit
df['DT'].dt.tz_localize('UTC').dt.tz_convert('Europe/Amsterdam')
[out]:
4.4 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.apply(lambda x: x['DT'].tz_localize('UTC').tz_convert('Europe/Amsterdam'), axis=1)
[out]:
35.9 s ± 572 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.