简体   繁体   English

使用python脚本将文件从.xls转换为.csv时,如何使数字不显示为指数?

[英]How to get numbers to NOT appear as exponentials when converting a file from .xls to .csv using python script?

I am trying to insert excel data into a table (postgres) using a python script. 我正在尝试使用python脚本将Excel数据插入表(postgres)。 I am running into an issue though where some of the larger numbers get inserted as exponentials. 我遇到了一个问题,尽管其中一些较大的数字以指数形式插入。 I realized this is happening when I am converting the file from .xls to .csv (I never open the .xls files because I realize that excel does some funky stuff where it'll save larger numbers into exponential form) 我意识到将文件从.xls转换为.csv时会发生这种情况(我从未打开过.xls文件,因为我意识到excel会做一些时髦的事情,它将较大的数字保存为指数形式)

Is there an easy way to ensure the numbers don't get displayed as exponentials? 有没有一种简单的方法来确保数字不会显示为指数?

ie 812492400097 is being displayed as 8.12E+11 即812492400097显示为8.12E + 11

Here is the convert to .csv script: 这是转换为.csv脚本:

import xlrd    
import unicodecsv    
import sys    
import os    
import datetime


def csv_from_excel(xlsfile, csvfile):  
    wb = xlrd.open_workbook(xlsfile)  
    sh = wb.sheet_by_index(0)  
    outputfile = open(csvfile, 'wb')  
    wr = unicodecsv.writer(outputfile, quoting=unicodecsv.QUOTE_ALL)

    for rownum in xrange(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    outputfile.close()

def log(s):
    print str(datetime.datetime.now()) + ": " + s


#main
if len(sys.argv) < 2:   
    print "Missing parameters: input xls file"   
    sys.exit()

sourcefile = sys.argv[1]

destfile = sourcefile.split('.')[0] + '.csv'

log("processing " + sourcefile)

csv_from_excel(sourcefile, destfile)

Also wondering if perhaps instead of ensuring the .csv doesn't turn numbers to exponentials, to turn exponentials to numbers when inserting into the postgres table? 还想知道是否不是确保.csv不会将数字转换为指数,而是在插入postgres表时将指数转换为数字?

The xlrd module treats all numbers from Excel as floats, because Excel calculates all numbers as floats: xlrd模块将来自Excel的所有数字视为浮点数,因为Excel将所有数字视为浮点数:

https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#sheet.Cell-class https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#sheet.Cell-class

  • Type symbol [Type number] Python value 类型符号[类型编号] Python值
  • XL_CELL_EMPTY [0] empty string u'' XL_CELL_EMPTY [0]空字符串u''
  • XL_CELL_TEXT [1] a Unicode string XL_CELL_TEXT [1] Unicode字符串
  • XL_CELL_NUMBER [2] float XL_CELL_NUMBER [2]个浮点数
  • XL_CELL_DATE [3] float XL_CELL_DATE [3]浮动
  • XL_CELL_BOOLEAN [4] int; XL_CELL_BOOLEAN [4] int; 1 means TRUE, 0 means FALSE 1表示TRUE,0表示FALSE
  • XL_CELL_ERROR [5] int representing internal Excel codes; XL_CELL_ERROR [5] int表示内部Excel代码; for a text representation, refer to the supplied dictionary error_text_from_code 有关文本表示形式,请参见提供的字典error_text_from_code
  • XL_CELL_BLANK [6] empty string u''. XL_CELL_BLANK [6]空字符串u''。 Note: this type will appear only when open_workbook(..., formatting_info=True) is used. 注意:仅当使用open_workbook(...,formatting_info = True)时,此类型才会出现。

Your solution, presumably, needs to define the formatting used by unicodecsv to write floats. 您的解决方案大概需要定义unicodecsv用来编写浮点数的格式。

A previous question ( How can I prevent csv.DictWriter() or writerow() rounding my floats? ) indicates that the csv module used to use float.__str__ rather than float.__repr__ , which caused rounding. 前面的问题( 如何防止csv.DictWriter()或writerow()对我的float进行四舍五入? )表明, csv模块以前使用float.__str__而不是float.__repr__ ,这导致了四舍五入。 unicodecsv might still use float.__str__ . unicodecsv可能仍使用float.__str__

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM