简体   繁体   中英

Python Pandas Debugging on to_datetime

Millions of records of data is in my dataframe. I have to convert of the string columns to datetime. I'm doing it as follows:

allData['Col1'] = pd.to_datetime(allData['Col1'])

However some of the strings are not valid datetime strings, and thus I get a value error. I'm not very good at debugging in Python, so I'm struggling to find the reason why some of the data items are not convertible.

I need Python to show me the row number, as well as the value that is not convertible, instead of throwing out a useless error that tells me nothing. How can I achieve this?

You can use boolean indexing with condition where check NaT values by isnull created to_datetime with parameter errors='coerce' - it create NaT where are invalid datetime:

allData1 = allData[pd.to_datetime(allData['Col1'], errors='coerce').isnull()]

Sample:

allData = pd.DataFrame({'Col1':['2015-01-03','a','2016-05-08'],
                        'B':[4,5,6],
                        'C':[7,8,9],
                        'D':[1,3,5],
                        'E':[5,3,6],
                        'F':[7,4,3]})

print (allData)
   B  C        Col1  D  E  F
0  4  7  2015-01-03  1  5  7
1  5  8           a  3  3  4
2  6  9  2016-05-08  5  6  3

print (pd.to_datetime(allData['Col1'], errors='coerce'))
0   2015-01-03
1          NaT
2   2016-05-08
Name: Col1, dtype: datetime64[ns]

print (pd.to_datetime(allData['Col1'], errors='coerce').isnull())
0    False
1     True
2    False
Name: Col1, dtype: bool


allData1 = allData[pd.to_datetime(allData['Col1'], errors='coerce').isnull()]
print (allData1)
   B  C Col1  D  E  F
1  5  8    a  3  3  4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM