简体   繁体   中英

How to Convert Abnormal Timestamp into datetime in Pandas dataframe

I'm creating a usage heatmap for some user analytics. The Y-axis will be day of the week and the X-axis will be hour of the day (24:00). I pulled the data from the API.(Note that this actually produces 6,000 rows of data)

IN:

import requests
import json

response = requests.get("api.url")
data = response.json()
df=pd.DataFrame(data['Sessions'])
df.dtypes
print(df['StartTime'])

OUT:

0     2019-01-29T22:08:40
1     2019-01-29T22:08:02
2     2019-01-29T22:05:10
3     2019-01-29T21:34:30
4     2019-01-29T21:32:49
Name: StartTime, Length: 100, dtype: object

I would normally convert the object into pandas.dt and then split it into two columns:

IN:

df['StartTime'] =  pd.to_datetime(df['StartTime'], format='%d%b%Y:%H:%M:%S.%f')
df['Date'] = [d.date() for d in df['StartTime']]
df['Time'] = [d.time() for d in df['StartTime']]

OUT:

'     StartTime                Date           Time
0     2019-01-29T22:08:40      2019-01-29     22:08:40
1     2019-01-29T22:08:02      2019-01-29     22:08:02
2     2019-01-29T22:05:10      2019-01-29     22:05:10
3     2019-01-29T21:34:30      2019-01-29     21:34:30
4     2019-01-29T21:32:49      2019-01-29     21:32:49

This isn't working because of that funky "T" in the middle of my timestamp and possibly because of the datatype.

I need to remove the T so I can convert this to a standard datetime format, then I need to separate Date and Time into their own columns. BONUS: I'd like to bring only the hour into its own column. Instead of 22:08:02, it would just be 22.

You need to use pandas timestamp:

>>> pd.Timestamp(‘2017-01-01T12’)
Timestamp(‘2017-01-01 12:00:00’)

So:

df['StartTime'] = df["StartTime"].apply(lambda x: pd.Timestamp(x))

#now StartTime has the correct data type so you can access
# date and time methods as well as the hour

df['Date'] = df["StartTime"].apply(lambda x: x.date())
df['Time'] = df["StartTime"].apply(lambda x: x.time())
df['Hour'] = df["StartTime"].apply(lambda x: x.hour)

As mentioned by @coldspeed, calling pd.to_datetime() or pd.Timesatmp() would work just fine, just ommit the format arguments

For parsing the timestamp dateutil is fantastic. It can figure out a date from nearly any string format.

To get just the hour from a datetime object you can use d.hour

You don't need to format the timestamp. Pandas can recognize the datetime format as like '2019-01-29T21:34:30'.

IN:

import pandas as pd    
dt = '2019-01-29T21:34:30'    
pd.to_datetime(dt)

OUT:

Timestamp('2019-01-29 21:11:15')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM