简体   繁体   中英

Create Pandas Datetime index from 8 digit date and 2,3, and 4 digit time

New to python/pandas as well as to stackoverflow. Currently using Spyder 2.3.1 from Anaconda.

I'm working with a CSV data set which provides the date and time as follows:

Date,Time
20140101,54
20140102,154
20140103,1654

I am currently reading in the date and parsing using read_csv as below:

df = pd.read_csv('filename.csv',                     
      index_col = 0,
      parse_dates= True, infer_datetime_format = True)

which yields

Datetimeindex        Time
2014-01-01 00:00:00  54
2014-01-02 00:00:00  154
2014-01-03 00:00:00  1654

Now I need to replace the timestamp for each row of my table with the actual time to yield:

Datetimeindex
2014-01-01 00:54:00
2014-01-02 01:54:00
2014-01-03 16:54:00

Could anyone provide an efficient method of doing achieving this result?

My method so far is:

import pandas as pd

length = len(df["Time"])
for i in range(0,length):
if len(str(df.iloc[i]["Time"]))==2:
    string = str(df.iloc[i]["Time"])
    hour = "00"
    minute = string
    second = "00"
    # replace time with actual time using hour, minute, and second variables
if len(str(df.iloc[i]["Time"])) == 3:
    string = str(df.iloc[i]["Time"])
    hour = "0" + string[:1]
    minute = string[1:]
    second = "00"
    # replace time with actual time using hour, minute, and second variables
if len(str(df.iloc[i]["Time"])) == 4:
    string = str(df.iloc[i]["Time"])
    hour = string[:2]
    minute = string[2:]
    second = "00"
    # replace time with actual time using hour, minute, and second variables

and I figured I would use the method from this thread to put in something like df.index[i] = df.index.map(lambda t: t.replace(hour=hour, minute=minute, day=day)) inside each if statement.

This obviously doesn't work, and I'm sure is wildly inefficient. Any help is appreciated.

Thank you.

Well you can make your code more efficient for sure by just padding all of the time figures with zeros, thereby avoiding the necessity to test how long the time is every time. I created a csv file called time_test.csv, then imported that data as string data. I created an empty container to put the datetimes in, then iterated over the DF and for each row I padded the time data with zeros as needed using a while loop and then just passed the info to datetime.datetime....

import datetime
import pandas as pd
DF = pd.read_csv('time_test.csv', dtypes = {'Date' : str, 'Time' : str})
datetime_index = []

for row in DF.index:
    time_val = DF.loc[row, 'Time']
    date_val = DF.loc[row, 'Date']
    while len(time_val) != 4: #pad with zeros as needed to avoid conditional testing
        time_val = '0' + time_val
    datetime_index.append(datetime.datetime(int(date_val[:4]), int(date_val[4:6]), int(date_val[6:]), int(time_val[:2]), int(time_val[2:]), 00))

DF['Datetime'] = pd.Series(datetime_index, index = DF.index)

results in:

In [36]: DF
Out[36]:
       Date  Time            Datetime
0  20140101    54 2014-01-01 00:54:00
1  20140102   154 2014-01-02 01:54:00
2  20140103  1654 2014-01-03 16:54:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM