如何在熊猫数据框中将异常时间戳转换为日期时间

Question

I'm creating a usage heatmap for some user analytics. 我正在为一些用户分析创建使用情况热图。 The Y-axis will be day of the week and the X-axis will be hour of the day (24:00). Y轴为星期几，X轴为一天中的小时（24:00）。 I pulled the data from the API.(Note that this actually produces 6,000 rows of data) 我从API中提取了数据（请注意，这实际上会产生6,000行数据）

IN: 在：

import requests
import json

response = requests.get("api.url")
data = response.json()
df=pd.DataFrame(data['Sessions'])
df.dtypes
print(df['StartTime'])

OUT: OUT：

0     2019-01-29T22:08:40
1     2019-01-29T22:08:02
2     2019-01-29T22:05:10
3     2019-01-29T21:34:30
4     2019-01-29T21:32:49
Name: StartTime, Length: 100, dtype: object

I would normally convert the object into pandas.dt and then split it into two columns: 我通常将对象转换为pandas.dt，然后将其分为两列：

IN: 在：

df['StartTime'] =  pd.to_datetime(df['StartTime'], format='%d%b%Y:%H:%M:%S.%f')
df['Date'] = [d.date() for d in df['StartTime']]
df['Time'] = [d.time() for d in df['StartTime']]

OUT: OUT：

'     StartTime                Date           Time
0     2019-01-29T22:08:40      2019-01-29     22:08:40
1     2019-01-29T22:08:02      2019-01-29     22:08:02
2     2019-01-29T22:05:10      2019-01-29     22:05:10
3     2019-01-29T21:34:30      2019-01-29     21:34:30
4     2019-01-29T21:32:49      2019-01-29     21:32:49

This isn't working because of that funky "T" in the middle of my timestamp and possibly because of the datatype. 由于我的时间戳记中间有一个时髦的“ T”，可能是因为数据类型，所以此方法不起作用。

I need to remove the T so I can convert this to a standard datetime format, then I need to separate Date and Time into their own columns. 我需要删除T，以便可以将其转换为标准的datetime格式，然后需要将Date和Time分成各自的列。 BONUS: I'd like to bring only the hour into its own column. 奖金：我只想把一个小时带到自己的专栏中。 Instead of 22:08:02, it would just be 22. 而不是22:08:02，而是22。

Answer 1

You need to use pandas timestamp: 您需要使用熊猫时间戳记：

>>> pd.Timestamp(‘2017-01-01T12’)
Timestamp(‘2017-01-01 12:00:00’)

So: 所以：

df['StartTime'] = df["StartTime"].apply(lambda x: pd.Timestamp(x))

#now StartTime has the correct data type so you can access
# date and time methods as well as the hour

df['Date'] = df["StartTime"].apply(lambda x: x.date())
df['Time'] = df["StartTime"].apply(lambda x: x.time())
df['Hour'] = df["StartTime"].apply(lambda x: x.hour)

As mentioned by @coldspeed, calling pd.to_datetime() or pd.Timesatmp() would work just fine, just ommit the format arguments 正如@coldspeed所提到的，调用pd.to_datetime（）或pd.Timesatmp（）可以正常工作，只是省略format参数

Answer 2

For parsing the timestamp dateutil is fantastic. 对于解析时间戳， dateutil很棒。 It can figure out a date from nearly any string format. 它几乎可以从任何字符串格式中找出日期。

To get just the hour from a datetime object you can use d.hour 要仅从日期时间对象获取小时，可以使用d.hour

Answer 3

You don't need to format the timestamp. 您不需要格式化时间戳。 Pandas can recognize the datetime format as like '2019-01-29T21:34:30'. 熊猫可以识别日期时间格式，例如“ 2019-01-29T21：34：30”。

IN: 在：

import pandas as pd    
dt = '2019-01-29T21:34:30'    
pd.to_datetime(dt)

OUT: OUT：

Timestamp('2019-01-29 21:11:15')

如何在熊猫数据框中将异常时间戳转换为日期时间

问题描述

3 个解决方案

解决方案1
0 已采纳 2019-01-29 23:13:55

解决方案2
0 2019-01-29 23:16:17

解决方案3
0 2019-01-29 23:22:48

如何在熊猫数据框中将异常时间戳转换为日期时间

问题描述

3 个解决方案

解决方案1 0 已采纳 2019-01-29 23:13:55

解决方案2 0 2019-01-29 23:16:17

解决方案3 0 2019-01-29 23:22:48

解决方案1
0 已采纳 2019-01-29 23:13:55

解决方案2
0 2019-01-29 23:16:17

解决方案3
0 2019-01-29 23:22:48