简体   繁体   English

python程序将excel文件转换为csv,从excel的date列中发出

[英]Python program to convert excel file to csv, issue in date column from excel

I am new to python and uising below code to convert excel file to csv 我是python的新手,并在下面的代码中将excel文件转换为csv

Code is 代码是

#!/bin/env python
import xlrd
import csv
from os import sys

def csv_from_excel(file1):
    workbook = xlrd.open_workbook(file1)
    worksheet = workbook.sheet_by_name('sheet1')
    csv1 = open('test.csv', 'wb')
    wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)

    for rownum in xrange(worksheet.nrows):
        wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])
    csv1.close()


if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

But column from excel with below values 但是Excel中的列具有以下值

Case    Code    Date    Amount
5428165773  UA02    4/23/2014    $(1,626.00)

showing as 显示为

'Case','Code','Date','Amount'
'5428165773','UA02',,'41752.0','-1626.0'

I also tried adding this but it didn't helped 我也尝试添加它,但没有帮助

dialect='excel', quotechar="'"

Excel uses a floating point number that represents the amount of days since a fixed date. Excel使用浮点数表示代表固定日期的天数。 You can use the datetime module to calculate the date and create a string. 您可以使用datetime模块来计算日期并创建一个字符串。

import datetime

exceldate = datetime.date(1899, 12, 30)

d = exceldate + datetime.timedelta(days=41752)

print d

datetime.date(2014, 4, 23) datetime.date(2014,4,23)

new_date = '{}/{}/{}'.format(d.month, d.day, d.year)    

If you already installed pandas module, These code will read excel file and store as dataframe 如果您已经安装了pandas模块,这些代码将读取excel文件并存储为数据框

import pandas as pd
xls = read_excel('path_to_file.xls')

Then 然后

xls.to_csv('path_to_csv.csv')

will write dataframe into CSV 将数据帧写入CSV

you can read more about this in, 您可以在其中了解更多信息,

http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-excel http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-excel

http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-store-in-csv http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-store-in-csv

I think the following function is what you need and it also deals with datetime.time: 我认为以下功能是您所需要的,并且它还处理datetime.time:

def xldate_to_python_date(value):
    """
    convert xl date to python date
    """
    date_tuple = xlrd.xldate_as_tuple(value, 0)
    ret = None
    if date_tuple == (0, 0, 0, 0, 0, 0):
        ret = datetime.datetime(1900, 1, 1, 0, 0, 0)
    elif date_tuple[0:3] == (0, 0, 0):
        ret = datetime.time(date_tuple[3],
                            date_tuple[4],
                            date_tuple[5])
    elif date_tuple[3:6] == (0, 0, 0):
        ret = datetime.date(date_tuple[0],
                            date_tuple[1],
                            date_tuple[2])
    return ret

Here is the documentation of xldate_as_tuple . 这是xldate_as_tuple的文档。 The above function is referenced from here 上面的功能从这里引用

And by the way, your csv_from_excel function can be rewritten as the following if you use my library pyexcel : 顺便说一下,如果您使用我的库pyexcel则可以将csv_from_excel函数重写为以下pyexcel

import pyexcel

def csv_from_excel(file1):
    excel_file = pyexcel.Reader(file1)
    csv_file = pyexcel.Writer("test.csv")
    csv_file.write_reader(excel_file)
    csv_file.close()

More documentation can be found on read-the-docs 可以在阅读文档中找到更多文档

yr, mth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0) yr,mth,dy,hr,min,sec = xlrd.xldate_as_tuple(entry,0)

this is solving my problem 这解决了我的问题

Thanks everyone and first of all thanks to Alex for very useful direction , right in first attempt 谢谢大家,首先感谢Alex的非常有用的指导,请立即尝试

Updates: My code looks like this now, but I am stuck at writing row level output 更新:我的代码现在看起来像这样,但是我被困在编写行级输出中

#!/bin/env python
import xlrd
import csv
from os import sys

def csv_from_excel(file1):
    workbook = xlrd.open_workbook(file1)
    worksheet = workbook.sheet_by_name('sheet1')
    csv1 = open('test.csv', 'wb')
    wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)
    for rownum in xrange(worksheet.nrows):
        if rownum > 2:
            i=0
            for entry in worksheet.row_values(rownum):
                i=i+1
                if i==3:
                    yr, mnth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0)
                    print str(mnth)+'/'+str(dy)+'/'+str(yr)
                    #wr.writerow(str(mnth)+'/'+str(dy)+'/'+str(yr))
                else:
                    print entry
                    #wr.writerow(unicode(entry).encode("utf-8"))
    your_csv_file.close()

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Current output 5428165773 UA02 4/23/2014 -1626.0 电流输出5428165773 UA02 2014/4/23 -1626.0

You would understand i need above output as 5428165773,UA02,4/23/2014,-1626.0 您将了解我需要以上输出为5428165773,UA02,4 / 23/2014,-1626.0

Please comment 请评论

Update: This problem solved as well by using print in for loops, insteas of writerow, since it expects whole row 更新:这个问题也通过使用print in for循环,writerow的实例来解决,因为它期望整个行

Thanks 谢谢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM