[英]How to Convert Abnormal Timestamp into datetime in Pandas dataframe
I'm creating a usage heatmap for some user analytics. 我正在为一些用户分析创建使用情况热图。 The Y-axis will be day of the week and the X-axis will be hour of the day (24:00).
Y轴为星期几,X轴为一天中的小时(24:00)。 I pulled the data from the API.(Note that this actually produces 6,000 rows of data)
我从API中提取了数据(请注意,这实际上会产生6,000行数据)
IN: 在:
import requests
import json
response = requests.get("api.url")
data = response.json()
df=pd.DataFrame(data['Sessions'])
df.dtypes
print(df['StartTime'])
OUT: OUT:
0 2019-01-29T22:08:40
1 2019-01-29T22:08:02
2 2019-01-29T22:05:10
3 2019-01-29T21:34:30
4 2019-01-29T21:32:49
Name: StartTime, Length: 100, dtype: object
I would normally convert the object into pandas.dt and then split it into two columns: 我通常将对象转换为pandas.dt,然后将其分为两列:
IN: 在:
df['StartTime'] = pd.to_datetime(df['StartTime'], format='%d%b%Y:%H:%M:%S.%f')
df['Date'] = [d.date() for d in df['StartTime']]
df['Time'] = [d.time() for d in df['StartTime']]
OUT: OUT:
' StartTime Date Time
0 2019-01-29T22:08:40 2019-01-29 22:08:40
1 2019-01-29T22:08:02 2019-01-29 22:08:02
2 2019-01-29T22:05:10 2019-01-29 22:05:10
3 2019-01-29T21:34:30 2019-01-29 21:34:30
4 2019-01-29T21:32:49 2019-01-29 21:32:49
This isn't working because of that funky "T" in the middle of my timestamp and possibly because of the datatype. 由于我的时间戳记中间有一个时髦的“ T”,可能是因为数据类型,所以此方法不起作用。
I need to remove the T so I can convert this to a standard datetime format, then I need to separate Date and Time into their own columns. 我需要删除T,以便可以将其转换为标准的datetime格式,然后需要将Date和Time分成各自的列。 BONUS: I'd like to bring only the hour into its own column.
奖金:我只想把一个小时带到自己的专栏中。 Instead of 22:08:02, it would just be 22.
而不是22:08:02,而是22。
You need to use pandas timestamp: 您需要使用熊猫时间戳记:
>>> pd.Timestamp(‘2017-01-01T12’)
Timestamp(‘2017-01-01 12:00:00’)
So: 所以:
df['StartTime'] = df["StartTime"].apply(lambda x: pd.Timestamp(x))
#now StartTime has the correct data type so you can access
# date and time methods as well as the hour
df['Date'] = df["StartTime"].apply(lambda x: x.date())
df['Time'] = df["StartTime"].apply(lambda x: x.time())
df['Hour'] = df["StartTime"].apply(lambda x: x.hour)
As mentioned by @coldspeed, calling pd.to_datetime() or pd.Timesatmp() would work just fine, just ommit the format
arguments 正如@coldspeed所提到的,调用pd.to_datetime()或pd.Timesatmp()可以正常工作,只是省略
format
参数
You don't need to format the timestamp. 您不需要格式化时间戳。 Pandas can recognize the datetime format as like '2019-01-29T21:34:30'.
熊猫可以识别日期时间格式,例如“ 2019-01-29T21:34:30”。
IN: 在:
import pandas as pd
dt = '2019-01-29T21:34:30'
pd.to_datetime(dt)
OUT: OUT:
Timestamp('2019-01-29 21:11:15')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.