从Pandas Dataframe中现有的日期和时间列创建datetime列的更快方法

Question

I have a Pandas Dataframe with Year, Month, Day, and Time columns, and I'm trying to combine them into a new column that just has a single datetime object. 我有一个带有Year，Month，Day和Time列的Pandas Dataframe，并且我试图将它们组合成一个只有一个datetime对象的新列。 The data type in each column is an int, including the time column (it goes between 1 and 2359). 每列中的数据类型为int，包括时间列（介于1和2359之间）。 For example: 2015, 3, 15, 745 would be March 15, 2015 at 7:45AM. 例如：2015年3月15日745日将是2015年3月15日上午7:45。

I currently just do this, but it takes several minutes to run on my dataframe that has 58000 rows: 我目前只是这样做，但是要在具有58000行的数据框上运行需要花费几分钟：

for i in range(len(flights.index)):
    flights['SCHEDULED_DEPARTURE_DATETIME'][i] = datetime.datetime(
        flights.iloc[i]['YEAR'], 
        flights.iloc[i]['MONTH'], 
        flights.iloc[i]['DAY'], 
        int(np.floor(flights.iloc[i]['SCHEDULED_DEPARTURE']/100)), #hours
        flights.iloc[i]['SCHEDULED_DEPARTURE']%100                 #minutes
    )

There must be a faster, more pythonic way to do this, but I can't seem to get it to work with apply. 必须有一种更快，更pythonic的方法来执行此操作，但是我似乎无法使其与apply一起使用。 What am I missing? 我想念什么？

FYI, my dataframe is a small subset of this data set from Kaggle: https://www.kaggle.com/usdot/flight-delays#flights.csv 仅供参考，我的数据框是来自Kaggle的该数据集的一小部分： https ://www.kaggle.com/usdot/flight-delays#flights.csv

Answer 1

You could use pd.to_datetime() like this: 您可以像这样使用pd.to_datetime() ：

import pandas as pd
import numpy as np

data = pd.DataFrame(np.array(
    [
        [2018, 10, 1, 2359],
        [2018, 10, 1, 1500],
        [2018, 10, 1, 900],
        [2018, 10, 1, 1],
        [2018, 10, 1, 0]
    ]
), columns = ['year', 'month', 'day', 'scheduled_departure'])

data['hour'] = np.floor(data['scheduled_departure'] / 100)

data['minute'] = data['scheduled_departure'] % 100

data['scheduled_departure_datetime'] = pd.to_datetime(data[['year', 'month', 'day', 'hour', 'minute']])

print(data['scheduled_departure_datetime'])

Giving: 赠送：

0   2018-10-01 23:59:00
1   2018-10-01 15:00:00
2   2018-10-01 09:00:00
3   2018-10-01 00:01:00
4   2018-10-01 00:00:00
Name: scheduled_departure_datetime, dtype: datetime64[ns]

I haven't tested speed but I imagine this will be faster. 我尚未测试速度，但我想这会更快。

从Pandas Dataframe中现有的日期和时间列创建datetime列的更快方法

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-11-05 02:56:40

从Pandas Dataframe中现有的日期和时间列创建datetime列的更快方法

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-11-05 02:56:40

解决方案1
0 已采纳 2018-11-05 02:56:40