简体   繁体   中英

Python - Combining Month, Day, Year into Date Column

I am trying to create a Date column from multiple columns Year, Month, and Day, however I get a few errors using standard procedure for creating date columns.

import pandas as pd
from urllib.request import urlopen

url = "https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h2017.txt.gz&dir=data/historical/stdmet/"
data_csv = urlopen(url)
df = pd.read_csv(data_csv, delim_whitespace=True, index_col=0, parse_dates=True)

#Reset Index
df.reset_index(level=0, inplace=True)

#remove 1st row contains erronous characters
df = df.iloc[1:]

#Rename Year column
df = df.rename(columns={'#YY': 'YY'})

df['Date'] = pd.to_datetime((df.YY*10000+df.MM*100+df.DD).apply(str),format='%Y%m%d') 

The last command does not work properly as no date column is created. Result:

ValueError: unconverted data remains: 4201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201420142014201

pd.to_datetime can automatically parse dates from multiple columns if they are named properly ( 'year', 'month', 'day', 'hour', 'minute' )

pd.to_datetime(df[['YY', 'MM', 'DD']].rename(columns={'YY': 'year', 'MM': 'month', 'DD': 'day'}))

Output:

1      2017-01-02
2      2017-01-02
3      2017-01-02
4      2017-01-02
5      2017-01-02
...
2427   2017-03-05
2428   2017-03-05
2429   2017-03-05
2430   2017-03-05

You can also add hours and minutes:

pd.to_datetime(df[['YY', 'MM', 'DD', 'hh', 'mm']].rename(
                columns={'YY': 'year', 'MM': 'month', 'DD': 'day',
                         'hh': 'hour', 'mm': 'minute'}))
#1      2017-01-02 06:00:00
#2      2017-01-02 06:20:00
#...
#2429   2017-03-05 01:40:00
#2430   2017-03-05 02:00:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM