[英]Python program to convert excel file to csv, issue in date column from excel
I am new to python and uising below code to convert excel file to csv 我是python的新手,并在下面的代码中将excel文件转换为csv
#!/bin/env python
import xlrd
import csv
from os import sys
def csv_from_excel(file1):
workbook = xlrd.open_workbook(file1)
worksheet = workbook.sheet_by_name('sheet1')
csv1 = open('test.csv', 'wb')
wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)
for rownum in xrange(worksheet.nrows):
wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])
csv1.close()
if __name__ == "__main__":
csv_from_excel(sys.argv[1])
But column from excel with below values 但是Excel中的列具有以下值
Case Code Date Amount
5428165773 UA02 4/23/2014 $(1,626.00)
showing as 显示为
'Case','Code','Date','Amount'
'5428165773','UA02',,'41752.0','-1626.0'
I also tried adding this but it didn't helped 我也尝试添加它,但没有帮助
dialect='excel', quotechar="'"
Excel uses a floating point number that represents the amount of days since a fixed date. Excel使用浮点数表示代表固定日期的天数。 You can use the
datetime
module to calculate the date and create a string. 您可以使用
datetime
模块来计算日期并创建一个字符串。
import datetime
exceldate = datetime.date(1899, 12, 30)
d = exceldate + datetime.timedelta(days=41752)
print d
datetime.date(2014, 4, 23) datetime.date(2014,4,23)
new_date = '{}/{}/{}'.format(d.month, d.day, d.year)
If you already installed pandas module, These code will read excel file and store as dataframe 如果您已经安装了pandas模块,这些代码将读取excel文件并存储为数据框
import pandas as pd
xls = read_excel('path_to_file.xls')
Then 然后
xls.to_csv('path_to_csv.csv')
will write dataframe into CSV 将数据帧写入CSV
you can read more about this in, 您可以在其中了解更多信息,
http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-excel http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-excel
http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-store-in-csv http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-store-in-csv
I think the following function is what you need and it also deals with datetime.time: 我认为以下功能是您所需要的,并且它还处理datetime.time:
def xldate_to_python_date(value):
"""
convert xl date to python date
"""
date_tuple = xlrd.xldate_as_tuple(value, 0)
ret = None
if date_tuple == (0, 0, 0, 0, 0, 0):
ret = datetime.datetime(1900, 1, 1, 0, 0, 0)
elif date_tuple[0:3] == (0, 0, 0):
ret = datetime.time(date_tuple[3],
date_tuple[4],
date_tuple[5])
elif date_tuple[3:6] == (0, 0, 0):
ret = datetime.date(date_tuple[0],
date_tuple[1],
date_tuple[2])
return ret
Here is the documentation of xldate_as_tuple . 这是xldate_as_tuple的文档。 The above function is referenced from here
上面的功能从这里引用
And by the way, your csv_from_excel function can be rewritten as the following if you use my library pyexcel
: 顺便说一下,如果您使用我的库
pyexcel
则可以将csv_from_excel函数重写为以下pyexcel
:
import pyexcel
def csv_from_excel(file1):
excel_file = pyexcel.Reader(file1)
csv_file = pyexcel.Writer("test.csv")
csv_file.write_reader(excel_file)
csv_file.close()
More documentation can be found on read-the-docs 可以在阅读文档中找到更多文档
yr, mth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0) yr,mth,dy,hr,min,sec = xlrd.xldate_as_tuple(entry,0)
this is solving my problem 这解决了我的问题
Thanks everyone and first of all thanks to Alex for very useful direction , right in first attempt 谢谢大家,首先感谢Alex的非常有用的指导,请立即尝试
Updates: My code looks like this now, but I am stuck at writing row level output 更新:我的代码现在看起来像这样,但是我被困在编写行级输出中
#!/bin/env python
import xlrd
import csv
from os import sys
def csv_from_excel(file1):
workbook = xlrd.open_workbook(file1)
worksheet = workbook.sheet_by_name('sheet1')
csv1 = open('test.csv', 'wb')
wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)
for rownum in xrange(worksheet.nrows):
if rownum > 2:
i=0
for entry in worksheet.row_values(rownum):
i=i+1
if i==3:
yr, mnth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0)
print str(mnth)+'/'+str(dy)+'/'+str(yr)
#wr.writerow(str(mnth)+'/'+str(dy)+'/'+str(yr))
else:
print entry
#wr.writerow(unicode(entry).encode("utf-8"))
your_csv_file.close()
if __name__ == "__main__":
csv_from_excel(sys.argv[1])
Current output 5428165773 UA02 4/23/2014 -1626.0 电流输出5428165773 UA02 2014/4/23 -1626.0
You would understand i need above output as 5428165773,UA02,4/23/2014,-1626.0 您将了解我需要以上输出为5428165773,UA02,4 / 23/2014,-1626.0
Please comment 请评论
Update: This problem solved as well by using print in for loops, insteas of writerow, since it expects whole row 更新:这个问题也通过使用print in for循环,writerow的实例来解决,因为它期望整个行
Thanks 谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.