简体   繁体   中英

Python Pandas and excel xlsx cell formats

So what I need to do is to get xlsx file to pandas dataframe then do some things with it and save it back as xlsx file.

How I do it is:

import pandas as pd
from openpyxl import load_workbook
from datetime import datetime

path = r'D:\Test\Test.xlsx'
path2 = r'D:\Test\TestResult.xlsx'

dataFrame = pd.read_excel(path, sheet_name=0, index_col=None, na_values=['NA'])
print(dataFrame.dtypes)

dataFrame.Hours = pd.to_datetime(dataFrame.Hours, format='%H:%M:%S').dt.time
print(dataFrame.dtypes)

book = load_workbook(path)
firstSheetName = book.sheetnames[0]
ws = book.get_sheet_by_name(firstSheetName)
book.remove(ws)
book.create_sheet(firstSheetName, 0)
writer = pd.ExcelWriter(path2, engine='openpyxl', date_format='yyyy-mm-dd')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
dataFrame.to_excel(writer, sheet_name=firstSheetName, index=False)

writer.save()

So far everything works fine, file get saved etc. But problem I do have is that in base file Test.xlsx my "Hours" column is of custom type (hh:mm:ss), and when I import it to dataframe it is recognized as "object" type. Also when I do save that data back into excel testResult.xlsx, this column changes to "general".

I was trying to change "object" type in dataframe to "datetime" type using code below but it has no effect, Hours is still visible as "object":

dataFrame.Hours = pd.to_datetime(dataFrame.Hours, format='%H:%M:%S').dt.time

So what I need helped with is, how to save that dataframe back to excel xlsx file where "Hours" column is set as custom "hh:mm:ss"???

Excel file is Test.xlsx and that how it looks inside:

https://docs.google.com/spreadsheets/d/1uu7g7xmMKy51BHpy0Up3T47VTHwtH9U_9PdlBSlaK80/edit?usp=sharing

"Hours" column is of custom type "hh:mm:ss"

Delete .dt.time , it can be converted to datetime64

You can take advantage of both the date_format and datetime_format parameters from the .ExcelWriter() [pandas-doc]

Just format the columns accordingly. In your case FromDate and ToDate to the datetime.date objects, and Hours to datetime.datetime objects.

df['FromDate'] = df['FromDate'].dt.date
df['ToDate'] = df['ToDate'].dt.date
df['Hours'] = pd.to_datetime(df['Hours'], format='%H:%M:%S')

And then specify the output format:

pd.ExcelWriter(path2, engine='openpyxl', )

Ok, after hours of trying I have found a solution. Big thanks goes to @afonso for helping me convert that string to datatime type.

Problem I still had was due to the fact that after conversion python was setting date as "1900-01-01 23:59:50" and excel was reading it as 1.324324243 (date and time format) instead of 0.1234325 (only time format - as date it looks like "1900-0-0 23:59:50").

So what I did was to use excel "bug" of not being able to read dates below year 1900 and substracted one day from my python datetime using this code:

dataFrame['Hours'] = dataFrame['Hours'] + pd.Timedelta(days=-1)

This resulted in sending to excel date "1899-12-31 23:59:50", and since excel couldnt read that date as date it autoamticaly changed it to "1900-01-00 23:59:50" and this solved my problem because that was exactly format I had on my input from excel.

Big thanks to everyone for help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM