简体   繁体   English

如何在熊猫数据框中将异常时间戳转换为日期时间

[英]How to Convert Abnormal Timestamp into datetime in Pandas dataframe

I'm creating a usage heatmap for some user analytics. 我正在为一些用户分析创建使用情况热图。 The Y-axis will be day of the week and the X-axis will be hour of the day (24:00). Y轴为星期几,X轴为一天中的小时(24:00)。 I pulled the data from the API.(Note that this actually produces 6,000 rows of data) 我从API中提取了数据(请注意,这实际上会产生6,000行数据)

IN: 在:

import requests
import json

response = requests.get("api.url")
data = response.json()
df=pd.DataFrame(data['Sessions'])
df.dtypes
print(df['StartTime'])

OUT: OUT:

0     2019-01-29T22:08:40
1     2019-01-29T22:08:02
2     2019-01-29T22:05:10
3     2019-01-29T21:34:30
4     2019-01-29T21:32:49
Name: StartTime, Length: 100, dtype: object

I would normally convert the object into pandas.dt and then split it into two columns: 我通常将对象转换为pandas.dt,然后将其分为两列:

IN: 在:

df['StartTime'] =  pd.to_datetime(df['StartTime'], format='%d%b%Y:%H:%M:%S.%f')
df['Date'] = [d.date() for d in df['StartTime']]
df['Time'] = [d.time() for d in df['StartTime']]

OUT: OUT:

'     StartTime                Date           Time
0     2019-01-29T22:08:40      2019-01-29     22:08:40
1     2019-01-29T22:08:02      2019-01-29     22:08:02
2     2019-01-29T22:05:10      2019-01-29     22:05:10
3     2019-01-29T21:34:30      2019-01-29     21:34:30
4     2019-01-29T21:32:49      2019-01-29     21:32:49

This isn't working because of that funky "T" in the middle of my timestamp and possibly because of the datatype. 由于我的时间戳记中间有一个时髦的“ T”,可能是因为数据类型,所以此方法不起作用。

I need to remove the T so I can convert this to a standard datetime format, then I need to separate Date and Time into their own columns. 我需要删除T,以便可以将其转换为标准的datetime格式,然后需要将Date和Time分成各自的列。 BONUS: I'd like to bring only the hour into its own column. 奖金:我只想把一个小时带到自己的专栏中。 Instead of 22:08:02, it would just be 22. 而不是22:08:02,而是22。

You need to use pandas timestamp: 您需要使用熊猫时间戳记:

>>> pd.Timestamp(‘2017-01-01T12’)
Timestamp(‘2017-01-01 12:00:00’)

So: 所以:

df['StartTime'] = df["StartTime"].apply(lambda x: pd.Timestamp(x))

#now StartTime has the correct data type so you can access
# date and time methods as well as the hour

df['Date'] = df["StartTime"].apply(lambda x: x.date())
df['Time'] = df["StartTime"].apply(lambda x: x.time())
df['Hour'] = df["StartTime"].apply(lambda x: x.hour)

As mentioned by @coldspeed, calling pd.to_datetime() or pd.Timesatmp() would work just fine, just ommit the format arguments 正如@coldspeed所提到的,调用pd.to_datetime()或pd.Timesatmp()可以正常工作,只是省略format参数

For parsing the timestamp dateutil is fantastic. 对于解析时间戳, dateutil很棒。 It can figure out a date from nearly any string format. 它几乎可以从任何字符串格式中找出日期。

To get just the hour from a datetime object you can use d.hour 要仅从日期时间对象获取小时,可以使用d.hour

You don't need to format the timestamp. 您不需要格式化时间戳。 Pandas can recognize the datetime format as like '2019-01-29T21:34:30'. 熊猫可以识别日期时间格式,例如“ 2019-01-29T21:34:30”。

IN: 在:

import pandas as pd    
dt = '2019-01-29T21:34:30'    
pd.to_datetime(dt)

OUT: OUT:

Timestamp('2019-01-29 21:11:15')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM