So what I need to do is to get xlsx file to pandas dataframe then do some things with it and save it back as xlsx file.
How I do it is:
import pandas as pd
from openpyxl import load_workbook
from datetime import datetime
path = r'D:\Test\Test.xlsx'
path2 = r'D:\Test\TestResult.xlsx'
dataFrame = pd.read_excel(path, sheet_name=0, index_col=None, na_values=['NA'])
print(dataFrame.dtypes)
dataFrame.Hours = pd.to_datetime(dataFrame.Hours, format='%H:%M:%S').dt.time
print(dataFrame.dtypes)
book = load_workbook(path)
firstSheetName = book.sheetnames[0]
ws = book.get_sheet_by_name(firstSheetName)
book.remove(ws)
book.create_sheet(firstSheetName, 0)
writer = pd.ExcelWriter(path2, engine='openpyxl', date_format='yyyy-mm-dd')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
dataFrame.to_excel(writer, sheet_name=firstSheetName, index=False)
writer.save()
So far everything works fine, file get saved etc. But problem I do have is that in base file Test.xlsx my "Hours" column is of custom type (hh:mm:ss), and when I import it to dataframe it is recognized as "object" type. Also when I do save that data back into excel testResult.xlsx, this column changes to "general".
I was trying to change "object" type in dataframe to "datetime" type using code below but it has no effect, Hours is still visible as "object":
dataFrame.Hours = pd.to_datetime(dataFrame.Hours, format='%H:%M:%S').dt.time
So what I need helped with is, how to save that dataframe back to excel xlsx file where "Hours" column is set as custom "hh:mm:ss"???
Excel file is Test.xlsx and that how it looks inside:
https://docs.google.com/spreadsheets/d/1uu7g7xmMKy51BHpy0Up3T47VTHwtH9U_9PdlBSlaK80/edit?usp=sharing
"Hours" column is of custom type "hh:mm:ss"
Delete .dt.time
, it can be converted to datetime64
You can take advantage of both the date_format
and datetime_format
parameters from the .ExcelWriter()
[pandas-doc]
Just format the columns accordingly. In your case FromDate
and ToDate
to the datetime.date
objects, and Hours
to datetime.datetime
objects.
df['FromDate'] = df['FromDate'].dt.date
df['ToDate'] = df['ToDate'].dt.date
df['Hours'] = pd.to_datetime(df['Hours'], format='%H:%M:%S')
And then specify the output format:
pd.ExcelWriter(path2, engine='openpyxl', )
Ok, after hours of trying I have found a solution. Big thanks goes to @afonso for helping me convert that string to datatime type.
Problem I still had was due to the fact that after conversion python was setting date as "1900-01-01 23:59:50" and excel was reading it as 1.324324243 (date and time format) instead of 0.1234325 (only time format - as date it looks like "1900-0-0 23:59:50").
So what I did was to use excel "bug" of not being able to read dates below year 1900 and substracted one day from my python datetime using this code:
dataFrame['Hours'] = dataFrame['Hours'] + pd.Timedelta(days=-1)
This resulted in sending to excel date "1899-12-31 23:59:50", and since excel couldnt read that date as date it autoamticaly changed it to "1900-01-00 23:59:50" and this solved my problem because that was exactly format I had on my input from excel.
Big thanks to everyone for help.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.