Millions of records of data is in my dataframe. I have to convert of the string columns to datetime. I'm doing it as follows:
allData['Col1'] = pd.to_datetime(allData['Col1'])
However some of the strings are not valid datetime strings, and thus I get a value error. I'm not very good at debugging in Python, so I'm struggling to find the reason why some of the data items are not convertible.
I need Python to show me the row number, as well as the value that is not convertible, instead of throwing out a useless error that tells me nothing. How can I achieve this?
You can use boolean indexing
with condition where check NaT
values by isnull
created to_datetime
with parameter errors='coerce'
- it create NaT
where are invalid datetime:
allData1 = allData[pd.to_datetime(allData['Col1'], errors='coerce').isnull()]
Sample:
allData = pd.DataFrame({'Col1':['2015-01-03','a','2016-05-08'],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (allData)
B C Col1 D E F
0 4 7 2015-01-03 1 5 7
1 5 8 a 3 3 4
2 6 9 2016-05-08 5 6 3
print (pd.to_datetime(allData['Col1'], errors='coerce'))
0 2015-01-03
1 NaT
2 2016-05-08
Name: Col1, dtype: datetime64[ns]
print (pd.to_datetime(allData['Col1'], errors='coerce').isnull())
0 False
1 True
2 False
Name: Col1, dtype: bool
allData1 = allData[pd.to_datetime(allData['Col1'], errors='coerce').isnull()]
print (allData1)
B C Col1 D E F
1 5 8 a 3 3 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.