简体   繁体   English

Pandas 系列:字符串日期到纪元 unix 秒

[英]Pandas Series: string date to epoch unix seconds

I have a Pandas Dataframe where one column is in a string date format as below我有一个 Pandas 数据框,其中一列采用字符串日期格式,如下所示

0               time
1  September 20 2016  
2  September 20 2016     
3  September 19 2016     
4  September 16 2016

What would be a succinct way for replacing time to be in epoch unix seconds?将时间替换为纪元 unix 秒的简洁方法是什么?

You can modify the values of a column using the Series' apply method by giving it a function containing the actions you want to perform on each of the values.您可以使用 Series 的apply方法修改列的值,方法是为其提供一个包含要对每个值执行的操作的函数。

For handling datetimes you can use dateutil.parser.parse to parse arbitrary strings into datetime objects.为了处理日期时间,您可以使用dateutil.parser.parse将任意字符串解析为日期时间对象。

import datetime
import pandas as pd
from dateutil.parser import parse

s = pd.Series(['September 20 2016',
'September 20 2016',
'September 19 2016',
'September 16 2016'])
df = pd.DataFrame(s)

def dt2epoch(value):
    d = parse(value)
    return d.timestamp()
    
df[0].apply(dt2epoch)  # apples given function to each value of column

Result:结果:

0    1474329600
1    1474329600
2    1474243200
3    1473984000
Name: 0, dtype: float64

You could try to_datetime .你可以试试to_datetime

import pandas as pd
your_df['time']=pd.to_datetime(your_df['time'])

Edit: To get the epoch from a datetime object, you can convert the series to an int64 object, which will give you the number of nanoseconds since the epoch, and divide by 10^9 (the number of nanoseconds in a second).编辑:要从日期时间对象获取纪元,您可以将系列转换为 int64 对象,这将为您提供自纪元以来的纳秒数,然后除以 10^9(一秒中的纳秒数)。

import numpy as np
your_df['time']  = (pd.to_datetime(your_df['time']).astype(np.int64)/10**9).astype(np.int64)

The last conversion is needed if you want to have it in integers (the division will give you floats instead)如果您想将其转换为整数,则需要最后一次转换(除法将为您提供浮点数)

Note: If you have NaT objects in your time series, they will show up as the integer value -9223372036, and you may want to either filter them out up-front, or have them being output as NaN (in which case, the resulting series must be of a float type instead of int).注意:如果您的时间序列中有 NaT 对象,它们将显示为整数值 -9223372036,您可能希望预先过滤掉它们,或者将它们输出为 NaN(在这种情况下,结果series 必须是 float 类型而不是 int)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM