简体   繁体   English

从Pandas Dataframe中现有的日期和时间列创建datetime列的更快方法

[英]Faster way of creating a datetime column from existing date and time columns in Pandas Dataframe

I have a Pandas Dataframe with Year, Month, Day, and Time columns, and I'm trying to combine them into a new column that just has a single datetime object. 我有一个带有Year,Month,Day和Time列的Pandas Dataframe,并且我试图将它们组合成一个只有一个datetime对象的新列。 The data type in each column is an int, including the time column (it goes between 1 and 2359). 每列中的数据类型为int,包括时间列(介于1和2359之间)。 For example: 2015, 3, 15, 745 would be March 15, 2015 at 7:45AM. 例如:2015年3月15日745日将是2015年3月15日上午7:45。

I currently just do this, but it takes several minutes to run on my dataframe that has 58000 rows: 我目前只是这样做,但是要在具有58000行的数据框上运行需要花费几分钟:

for i in range(len(flights.index)):
    flights['SCHEDULED_DEPARTURE_DATETIME'][i] = datetime.datetime(
        flights.iloc[i]['YEAR'], 
        flights.iloc[i]['MONTH'], 
        flights.iloc[i]['DAY'], 
        int(np.floor(flights.iloc[i]['SCHEDULED_DEPARTURE']/100)), #hours
        flights.iloc[i]['SCHEDULED_DEPARTURE']%100                 #minutes
    ) 

There must be a faster, more pythonic way to do this, but I can't seem to get it to work with apply. 必须有一种更快,更pythonic的方法来执行此操作,但是我似乎无法使其与apply一起使用。 What am I missing? 我想念什么?

FYI, my dataframe is a small subset of this data set from Kaggle: https://www.kaggle.com/usdot/flight-delays#flights.csv 仅供参考,我的数据框是来自Kaggle的该数据集的一小部分: https ://www.kaggle.com/usdot/flight-delays#flights.csv

You could use pd.to_datetime() like this: 您可以像这样使用pd.to_datetime()

import pandas as pd
import numpy as np

data = pd.DataFrame(np.array(
    [
        [2018, 10, 1, 2359],
        [2018, 10, 1, 1500],
        [2018, 10, 1, 900],
        [2018, 10, 1, 1],
        [2018, 10, 1, 0]
    ]
), columns = ['year', 'month', 'day', 'scheduled_departure'])

data['hour'] = np.floor(data['scheduled_departure'] / 100)

data['minute'] = data['scheduled_departure'] % 100

data['scheduled_departure_datetime'] = pd.to_datetime(data[['year', 'month', 'day', 'hour', 'minute']])

print(data['scheduled_departure_datetime'])

Giving: 赠送:

0   2018-10-01 23:59:00
1   2018-10-01 15:00:00
2   2018-10-01 09:00:00
3   2018-10-01 00:01:00
4   2018-10-01 00:00:00
Name: scheduled_departure_datetime, dtype: datetime64[ns]

I haven't tested speed but I imagine this will be faster. 我尚未测试速度,但我想这会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 Pandas dataframe 和 Python 中的现有日期时间列创建星期几列 - Creating day of the week column from existing date time column in Pandas dataframe with Python 从现有日期在 Pandas 中创建新的日期时间列 - Creating a new datetime column in Pandas from an existing date 从单独的时间和日期列创建 DateTime 列 - Creating a DateTime column from seperate time and date columns 从另一个 dataframe 创建 pandas dataframe 的更快方法 - faster way of creating pandas dataframe from another dataframe 在熊猫数据框中将datetime64列拆分为日期和时间列 - Split datetime64 column into a date and time column in pandas dataframe 熊猫:日期时间索引系列到时间索引日期列的数据框 - Pandas: datetime indexed series to time indexed date columns dataframe 从现有 dataframe 的某些列创建新的 pandas dataframe - Creating new pandas dataframe from certain columns of existing dataframe 将 dataframe 列从 Pandas 时间戳转换为日期时间(或 datetime.date) - Converting dataframe column from Pandas Timestamp to datetime (or datetime.date) 将熊猫数据框中的初始日期与时间列合并为日期时间 - Combine initial date with time column in a pandas dataframe as datetime 如何在 dataframe 日期时间列 pandas 中获取非连续日期时间 - how to get non continuous date time in dataframe datetime column pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM