[英]Faster way of creating a datetime column from existing date and time columns in Pandas Dataframe
I have a Pandas Dataframe with Year, Month, Day, and Time columns, and I'm trying to combine them into a new column that just has a single datetime object. 我有一个带有Year,Month,Day和Time列的Pandas Dataframe,并且我试图将它们组合成一个只有一个datetime对象的新列。 The data type in each column is an int, including the time column (it goes between 1 and 2359).
每列中的数据类型为int,包括时间列(介于1和2359之间)。 For example: 2015, 3, 15, 745 would be March 15, 2015 at 7:45AM.
例如:2015年3月15日745日将是2015年3月15日上午7:45。
I currently just do this, but it takes several minutes to run on my dataframe that has 58000 rows: 我目前只是这样做,但是要在具有58000行的数据框上运行需要花费几分钟:
for i in range(len(flights.index)):
flights['SCHEDULED_DEPARTURE_DATETIME'][i] = datetime.datetime(
flights.iloc[i]['YEAR'],
flights.iloc[i]['MONTH'],
flights.iloc[i]['DAY'],
int(np.floor(flights.iloc[i]['SCHEDULED_DEPARTURE']/100)), #hours
flights.iloc[i]['SCHEDULED_DEPARTURE']%100 #minutes
)
There must be a faster, more pythonic way to do this, but I can't seem to get it to work with apply. 必须有一种更快,更pythonic的方法来执行此操作,但是我似乎无法使其与apply一起使用。 What am I missing?
我想念什么?
FYI, my dataframe is a small subset of this data set from Kaggle: https://www.kaggle.com/usdot/flight-delays#flights.csv 仅供参考,我的数据框是来自Kaggle的该数据集的一小部分: https ://www.kaggle.com/usdot/flight-delays#flights.csv
You could use pd.to_datetime()
like this: 您可以像这样使用
pd.to_datetime()
:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.array(
[
[2018, 10, 1, 2359],
[2018, 10, 1, 1500],
[2018, 10, 1, 900],
[2018, 10, 1, 1],
[2018, 10, 1, 0]
]
), columns = ['year', 'month', 'day', 'scheduled_departure'])
data['hour'] = np.floor(data['scheduled_departure'] / 100)
data['minute'] = data['scheduled_departure'] % 100
data['scheduled_departure_datetime'] = pd.to_datetime(data[['year', 'month', 'day', 'hour', 'minute']])
print(data['scheduled_departure_datetime'])
Giving: 赠送:
0 2018-10-01 23:59:00
1 2018-10-01 15:00:00
2 2018-10-01 09:00:00
3 2018-10-01 00:01:00
4 2018-10-01 00:00:00
Name: scheduled_departure_datetime, dtype: datetime64[ns]
I haven't tested speed but I imagine this will be faster. 我尚未测试速度,但我想这会更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.