简体   繁体   中英

Date difference in hours (Excel data import)?

I need to calculate hour difference between two dates (format: year-month-dayTHH:MM:SS I could also potentially transform data format to (format: year-month-day HH:MM:SS) from huge excel file. What is the most efficient way to do it in Python? I have tried to use Datatime/Time object (TypeError: expected string or buffer), Timestamp (ValueError) and DataFrame (does not give hour result).

Excel File:

Order_Date             Received_Customer   Column3
2000-10-06T13:00:58    2000-11-06T13:00:58    1
2000-10-21T15:40:15    2000-12-27T10:09:29    2
2000-10-23T10:09:29    2000-10-26T10:09:29    3
.....                  ....

Datatime/Time object code (TypeError: expected string or buffer):

import pandas as pd
import time as t

data=pd.read_excel('/path/file.xlsx')

s1 = (data,['Order_Date'])
s2 = (data,['Received_Customer'])

s1Time = t.strptime(s1, "%Y:%m:%d:%H:%M:%S")
s2Time = t.strptime(s2, "%Y:%m:%d:%H:%M:%S")

deltaInHours = (t.mktime(s2Time) - t.mktime(s1Time))

print deltaInHours, "hours"

Timestamp (ValueError) code:

import pandas as pd
import datetime as dt

data=pd.read_excel('/path/file.xlsx')

df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df.to = [pd.Timestamp('Order_Date')]
df.fr = [pd.Timestamp('Received_Customer')]
(df.fr-df.to).astype('timedelta64[h]')

DataFrame (does not return the desired result)

import pandas as pd

data=pd.read_excel('/path/file.xlsx')

df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])

df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Received_Customer'] = pd.to_datetime(df['Received_Customer'])

answer = df.dropna()['Order_Date'] - df.dropna()['Received_Customer']

answer.astype('timedelta64[h]')

print(answer)

Output:

0   24 days 16:38:07
1    0 days 00:00:00
2   20 days 12:39:52
dtype: timedelta64[ns]

Should be something like this:

0   592 hour
1   0   hour
2   492 hour

Is there another way to convert timedelta64[ns] into hours than answer.astype('timedelta64[h]') ?

For each of your solutions you mixed up datatypes and methods. Whereas I do not find the time to explicitly explain your mistakes, yet i want to help you by providing a (probably non optimal) solution. I built the solution out of your previous tries and I combined it with knowledge from other questions such as:

Convert a timedelta to days, hours and minutes

Get total number of hours from a Pandas Timedelta?

Note that i used Python 3. I hope that my solution guides your way. My solution is this one:

import pandas as pd
from datetime import  datetime
import numpy as np

d = pd.read_excel('C:\\Users\\nrieble\\Desktop\\check.xlsx',header=0)

start = [pd.to_datetime(e) for e in data['Order_Date'] if len(str(e))>4]
end = [pd.to_datetime(e) for e in data['Received_Customer'] if len(str(e))>4]

delta = np.asarray(s2Time)-np.asarray(s1Time)
deltainhours = [e/np.timedelta64(1, 'h') for e in delta]

print (deltainhours, "hours")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM