简体   繁体   English

如何让 Pandas 将包含 NaT 的列从 timedelta 转换为 datetime?

[英]How can I make Pandas convert a column which contains NaT from timedelta to datetime?

I have a pandas dataframe with a column which is of type timedelta64[ns] , and which I would like to convert ot datetime64[ns] .我有一个 pandas dataframe 的列类型为timedelta64[ns] ,我想将其转换为datetime64[ns]

The pd.to_datetime() function purports to do just that, and has worked in the past, but appears to fail now. pd.to_datetime() function 声称可以做到这一点,并且过去一直有效,但现在似乎失败了。 I would assume this might be related to an API quirk which has gone beneath my radar.我认为这可能与我没有注意到的 API 怪癖有关。 Currently it fails with:目前它失败了:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 724, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 152, in _maybe_cache
    cache_dates = convert_listlike(unique_dates, format)
  File "/usr/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 363, in _convert_listlike_datetimes
    arg, _ = maybe_convert_dtype(arg, copy=False)
  File "/usr/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1916, in maybe_convert_dtype
    raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns]

To try and reproduce, please use the MWE below:要尝试复制,请使用以下 MWE:

wget https://chymera.eu/ppb/61ebad.csv
python
import pandas as pd
df = pd.read_csv('61ebad.csv')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce')
df['Animal_death_date'] = pd.to_datetime(df['Animal_death_date'], errors='coerce')

The error also occurs if I am using errors='ignore' .如果我使用errors='ignore'也会发生错误。 For reference, I am using Pandas 1.0.1 .作为参考,我使用的是 Pandas 1.0.1

If need convert timedeltas to datetime, add some start datetime:如果需要将 timedeltas 转换为日期时间,请添加一些开始日期时间:

import pandas as pd

df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime('2000-01-01')
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0                     NaT
1                     NaT
2                     NaT
3                     NaT
4                     NaT

843                   NaT
844                   NaT
845   2000-05-12 19:00:00
846   2000-05-12 19:00:00
847   2000-05-12 19:00:00
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]

Or add some column filled by datetimes:或者添加一些由日期时间填充的列:

import pandas as pd

df = pd.read_csv('https://chymera.eu/ppb/61ebad.csv')
start = pd.to_datetime(df['FMRIMeasurement_date'])
df['Animal_death_date'] = pd.to_timedelta(df['Animal_death_date'], errors='coerce') + start
print (df['Animal_death_date'] )
0                     NaT
1                     NaT
2                     NaT
3                     NaT
4                     NaT

843                   NaT
844                   NaT
845   2018-10-04 19:20:54
846   2018-10-04 19:20:54
847   2018-10-04 19:20:54
Name: Animal_death_date, Length: 848, dtype: datetime64[ns]

Start with a small correction: Your source column in question is also a text column, but only formatted as timedelta .从一个小的更正开始:您的源列也是一个文本列,但仅格式化timedelta

To convert Animal_death_date column define the following function:要转换Animal_death_date列,请定义以下 function:

def myDateConv(tt):
    return pd.to_datetime('2020-' + tt, format='%Y-%j days %X.%f')\
        if len(tt) > 0 else np.nan

I assume that your dates are from this year, hence 2020 as the initial part of the whole date string.我假设您的日期是从今年开始的,因此2020 年是整个日期字符串的初始部分。 If they are from other year, change this prefix accordingly.如果它们来自其他年份,请相应地更改此前缀。

But apply this function as early as when you read the source file:但早在阅读源文件时应用此 function:

df = pd.read_csv('61ebad.csv', index_col=0, parse_dates=['Treatment_start_date',
    'Treatment_end_date', 'FMRIMeasurement_date', 'OpenFieldTestMeasurement_date',
    'ForcedSwimTestMeasurement_date', 'CageStay_start_date', 'Cage_Treatment_start_date',
    'Cage_Treatment_end_date', 'SucrosePreferenceMeasurement_date', 'reference_date'],
    converters = { 'Animal_death_date': myDateConv })

Note additional parameters:注意附加参数:

  • index_col - to treat the initial column as the index, index_col - 将初始列视为索引,
  • parse_dates - to convert "normally" formatted dates to datetime , parse_dates - 将“正常”格式化的日期转换为datetime
  • converters - to apply the above function to the source of Animal_death_date column. converters - 将上述 function 应用于Animal_death_date列的源。

I think, this solution is simpler and more readable than individual conversion of particular columns.我认为,这个解决方案比特定列的单独转换更简单、更易读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM