简体   繁体   中英

Python ValueError: time data 'Date' does not match format '%Y/%m/%d %H:%M:%S.%f'

I am parsing dates from a Dataframe where the data comes from a CSV file. I get the said error. I am sure the format is correct.

My Code:

import pandas as pd
    from datetime import datetime
    import csv
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    headers = ['Sensor Value','Date','Time']
    df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV',names=headers)
    print (df)

    df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
    x = df['Date']
    y = df['Sensor Value']

    # plot
    plt.plot(x,y)
    # beautify the x-labels
    plt.gcf().autofmt_xdate()

    plt.show()

My DataFrame:

0    Sensor Value         Date           Time
1               2   2017/02/17   19:06:17.188
2              72   2017/02/17   19:06:22.360
3              72   2017/02/17   19:06:27.348

Console Error:

   new_values = map_f(values, arg)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer     (pandas\lib.c:66124)
  File "C:/Users/Lala Rushan/PycharmProjects/newgraph/newgraph.py", line 10, in <lambda>
  df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
  File "C:\Users\Lala    Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 500, in _strptime_datetime
  tt, fraction = _strptime(data_string, format)
  File "C:\Users\Lala   Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 337, in _strptime
  (data_string, format))
 ValueError: time data 'Date' does not match format '%Y/%m/%d %H:%M:%S.%f'

CSV Input:

    Sensor Value    Date    Time
2    2017/02/17  19:06:17.188
72   2017/02/17  19:06:22.360
72   2017/02/17  19:06:27.348
72   2017/02/17  19:06:32.482
74   2017/02/17  19:06:37.515
70   2017/02/17  19:06:42.580

I think you need to_datetime with parameter errors='coerce' for parse problematic values to NaN :

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

For check problematic rows:

print (df[pd.to_datetime(df['Date'], errors='coerce').isnull()])

But if need read columns date and time to datetime use parameter parse_dates in read_csv :

import pandas as pd
from pandas.compat import StringIO

temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/17,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
                 Datetime  Sensor Value
0 2017-02-17 19:06:17.188             2
1 2017-02-17 19:06:22.360            72
2 2017-02-17 19:06:27.348            72
3 2017-02-17 19:06:32.482            72
4 2017-02-17 19:06:37.515            74
5 2017-02-17 19:06:42.580            70

print (df.dtypes)
Datetime        datetime64[ns]
Sensor Value             int64
dtype: object

First solution combined with second, where in last row is 30.2.2017 what does not exist:

temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/30,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
                  Datetime  Sensor Value
0  2017/02/17 19:06:17.188             2
1  2017/02/17 19:06:22.360            72
2  2017/02/17 19:06:27.348            72
3  2017/02/17 19:06:32.482            72
4  2017/02/17 19:06:37.515            74
5  2017/02/30 19:06:42.580            70

df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce')
print (df)
                 Datetime  Sensor Value
0 2017-02-17 19:06:17.188             2
1 2017-02-17 19:06:22.360            72
2 2017-02-17 19:06:27.348            72
3 2017-02-17 19:06:32.482            72
4 2017-02-17 19:06:37.515            74
5                     NaT            70 <- replace 30.2.2017 to NaT (same as NaN for dates)

print (df.dtypes)
Datetime        datetime64[ns]
Sensor Value             int64
dtype: object

Remove the names=headers part as this is confusing pandas . pandas assumes that the first row is a header rowby default. By specifying the header names, it assumes that the first row must be data and so your error is that the word Date does not match your format (which it doesn't)

EDIT: just change the read_csv line to this:

df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV')

Consequently, you don't have to specify headers anymore so you can remove that line.

EDIT2:

The problem is that your date and time fields are deperate. Create a new column, called DateTime, which combines the two and then apply your striptime function to that.

Replace the line I told you to delete, the df['Date'] = df['Date'].map one with this

df['DateTime'] = df['Date'] + " " + df['Time']
df['DateTime'] = df['DateTime'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))

I had a similar problem while parsing dates in a large csv file. In my case, I had a few bad lines in the csv that were triggering the error, so I simply removed them from the dataframe, to later parse the dates.

If you don't mind to lose this information, you can do something like this:

df = df[df['Date'].str.contains(r'^\d{4}-\d{2}-\d{2} \d{2}\:\d{2}:\d{2}.\d{3}')]

timer = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')       

df['Date'] =  df['Date'].apply(timer)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM