简体   繁体   中英

Python - Parse object index with multiple time zones

Python Q. How to parse an object index in a data frame into its date, time, and time zone when it has multiple time zones?

The format is "YYY-MM-DD HH:MM:SS-HH:MM" where the right "HH:MM" is the timezone.

Example: Midnight Jan 1st, 2020 in Mountain Time, counting up:

2020-01-01 00:00:00-07:00
2020-01-01 01:00:00-07:00
2020-01-01 02:00:00-07:00
2020-01-01 04:00:00-06:00

I've got code that works for one time zone, but it breaks when a second timezone is introduced.

df['Date'] = pd.to_datetime(df.index)
df['year']= df['Date'].dt.year
df['month']= df['Date'].dt.month
df['month_n']= df['Date'].dt.month_name()
df['day']= df['Date'].dt.day
df['day_n']= df['Date'].dt.day_name()
df['h']= df['Date'].dt.hour
df['mn']= df['Date'].dt.minute
df['s']= df['Date'].dt.second

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc="True"

Use pandas.DataFrame.apply instead:

df['Date'] = pd.to_datetime(df.index)

df_info = df['Date'].apply(lambda t: pd.Series({
    'date': t.date(),
    'year': t.year,
    'month': t.month,
    'month_n': t.strftime("%B"),
    'day': t.day,
    'day_n': t.strftime("%A"),
    'h': t.hour,
    'mn': t.minute,
    's': t.second,
}))

df = pd.concat([df, df_info], axis=1)
# Output:
print(df)

                                                    Date        date  year  month  month_n  day      day_n  h  mn  s
    col                                                                                                             
    2020-01-01 00:00:00-07:00  2020-01-01 00:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  0   0  0
    2020-01-01 01:00:00-07:00  2020-01-01 01:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  1   0  0
    2020-01-01 02:00:00-07:00  2020-01-01 02:00:00-07:00  2020-01-01  2020      1  January    1  Wednesday  2   0  0
    2020-01-01 04:00:00-06:00  2020-01-01 04:00:00-06:00  2020-01-01  2020      1  January    1  Wednesday  4   0  0

@abokey 's answer is great if you aren't sure of the actual time zone or cannot work with UTC. However, you don't have the dt accessor and lose the performance of a "vectorized" approach.

So if you can use UTC or set a time zone (you just have UTC offset at the moment,). eg, "America/Denver": all will work as expected:

import pandas as pd

df = pd.DataFrame({'v': [999,999,999,999]},
                   index = ["2020-01-01 00:00:00-07:00",
                            "2020-01-01 01:00:00-07:00",
                            "2020-01-01 02:00:00-07:00",
                            "2020-01-01 04:00:00-06:00"])

df['Date'] = pd.to_datetime(df.index, utc=True)

print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00     7
# 2020-01-01 01:00:00-07:00     8
# 2020-01-01 02:00:00-07:00     9
# 2020-01-01 04:00:00-06:00    10
# Name: Date, dtype: int64

# Note: hour changed since we converted to UTC !

or

df['Date'] = pd.to_datetime(df.index, utc=True).tz_convert("America/Denver")
print(df.Date.dt.hour)
# 2020-01-01 00:00:00-07:00    0
# 2020-01-01 01:00:00-07:00    1
# 2020-01-01 02:00:00-07:00    2
# 2020-01-01 04:00:00-06:00    3
# Name: Date, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM