简体   繁体   中英

Pandas to_datetime loses timezone

My raw data has a column with timestamps in ISO8601 format like this:

'2017-07-25T06:00:02+02:00'

Since the data is in CSV format, it will be read as object/string. Therefore I'm converting it to datetime like this.

import pandas pd
df['time'] = pd.to_datetime(df['time'], utc=False)

#df['time'][0]
df['time'][0].isoformat()

Unfortunately this results in UTC timestamps and the timezone is lost. For instance df['time'][0].tzinfo is not set.

Timestamp('2017-07-25 04:00:02')

'2017-07-25T04:00:02'

I'm looking for a way to keep the timezone info in each of the timezone objects. But without re-setting it to CEST (Central European Summer Time) afterwards since this information is already included in the ISO8601 timezone offset in the raw-data. Any idea how to do this?

So here's how I solved it.

There's a great article about Timezones and Python , which helped me to come to a solution. It relies on the ISO8601 Python packages .

import iso8601

times = ['2017-07-25 06:00:02+02:00',
         '2017-07-25 08:15:08+02:00',
         '2017-07-25 12:08:00+02:00',
         '2017-07-25 13:10:12+02:00',
         '2017-07-25 15:11:55+02:00',
         '2017-07-25 16:00:00+02:00'
        ]

df = pd.DataFrame(times, columns=['time'])
df['time'] = df['time'].apply(iso8601.parse_date)
df['time'][0]

Which produces the following output and keeps the timezone information.

Timestamp('2017-07-25 06:00:02+0200', tz='+02:00')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM