简体   繁体   English

DataFrame 中的混合时间戳在转换为 int64 时会导致错误

[英]Mixed timestamps in a DataFrame cause an error when converting to int64

I am merging time-indexed data from multiple sources (with some times as integer timestamps and some as UTC strings), converting the times to a pandas Timestamp for the purpose of manipulation, and then need to re-export with the times as epoch timestamps.我正在合并来自多个来源的时间索引数据(有时是 integer 时间戳,有些是 UTC 字符串),将时间转换为 pandas 时间戳以进行操作,然后需要将时间重新导出为纪元时间戳. The problem is that I'm getting an error when converting the Timestamps back to int64 when (and only when) the DataFrame contains a mix of UTC and non-UTC Timestamps.问题是当(且仅当)DataFrame 包含 UTC 和非 UTC 时间戳的混合时,将时间戳转换回 int64 时出现错误。

This works:这有效:

df1 = pd.DataFrame([{'time':1617217320000}])
df1['time'] = pd.to_datetime(df1['time'], unit='ms')
df1['time'] = df1['time'].values.astype('int64') // 10**9

So does this:这样做也是如此:

df2 = pd.DataFrame([{'time':'2021-03-30T18:52:00.000Z'}])
df2['time'] = pd.to_datetime(df2['time'])
df2['time'] = df2['time'].values.astype('int64') // 10**9

But this does not:但这不会:

df1 = pd.DataFrame([{'time':1617217320000}])
df1['time'] = pd.to_datetime(df1['time'], unit='ms')
df2 = pd.DataFrame([{'time':'2021-03-30T18:52:00.000Z'}])
df2['time'] = pd.to_datetime(df2['time'])
df = df1.append(df2)
df['time'] = df['time'].values.astype('int64') // 10**9

# TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'

Do I need to normalize these somehow to allow the conversion to work?我是否需要以某种方式规范化这些以允许转换工作?

If I understand correctly, you can convert the mixed timestamps to_datetime() with utc=True (which will normalize non-UTC to UTC) before the int64 conversion:如果我理解正确,您可以在int64转换之前使用utc=True将混合时间戳转换为to_datetime() (这会将非 UTC 标准化为 UTC):

df['time'] = pd.to_datetime(df['time'], utc=True).astype('int64') // 10**9

#       time
# 1617217320
# 1617130320

You can also use the .value attribute of pandas Timestamp() class (of which both of your values are instances) ( https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html ) to convert both types of timestamp to int. You can also use the .value attribute of pandas Timestamp() class (of which both of your values are instances) ( https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html ) to convert both时间戳类型为 int。

>>> df['time'] = df.time.apply(lambda x: x.value // 10**9)

>>> df
     time
0    1617217320
1    1617130320

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM