简体   繁体   English

xlrd函数返回工作簿对象的文件类型

[英]xlrd function to return the file type of a workbook object

The title says it all - is there a function in xlrd that returns the file type (MIME type, xls or xlsx, etc.) of the workbook that has just been opened with xlrd.open_workbook(fileName) ? 标题说明了一切-xlrd中是否有一个函数返回刚刚使用xlrd.open_workbook(fileName)打开的工作簿的文件类型(MIME类型,xls或xlsx等xlrd.open_workbook(fileName) I can't find one in the documentation. 我在文档中找不到一个。

Ok, after reading through the code of the __init__ , book , and xlsx classes in xlrd on GitHub ( https://github.com/python-excel/xlrd/ ), I see that there is no attribute of the Book object that returns the file type. 好的,在阅读了GitHub( https://github.com/python-excel/xlrd/ )上xlrd中__init__bookxlsx类的代码之后,我发现Book对象没有返回任何属性文件类型。 The closest I can get is to use the log file and set verbosity to True: 我能得到的最接近的是使用日志文件并将详细程度设置为True:

import xlrd

def ReadSpreadsheet(filePath):
    myLog = open(''.join([filePath,'.log.txt']), 'w')
    myLog.write(''.join(['Opening ',filePath,'\n']))
    wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
    myLog.close()

This function will write a log file that shows the components for each file. 此函数将编写一个日志文件,其中显示每个文件的组件。 Testing with four files, it's very obvious from the log which of the files are recognized as xlsx files, which are recognized as xls files, and which are not recognized: 使用四个文件进行测试,从日志中非常明显的是,哪些文件被识别为xlsx文件,哪些文件被识别为xls文件,哪些未被识别:

Office 2010 xlsx file: Office 2010 xlsx文件:

>>> testing_xls.ReadSpreadsheet('MS.xlsx')

Opening MS.xlsx
ZIP component_names:
['[Content_Types].xml',
 '_rels/.rels',
 'xl/_rels/workbook.xml.rels',
 'xl/workbook.xml',
 'xl/sharedStrings.xml',
 'xl/worksheets/_rels/sheet1.xml.rels',
 'xl/theme/theme1.xml',
 'xl/styles.xml',
 'xl/worksheets/sheet1.xml',
 'docProps/core.xml',
 'xl/printerSettings/printerSettings1.bin',
 'docProps/app.xml']

Office 2010 xls file: Office 2010 xls文件:

>>> testing_xls.ReadSpreadsheet('MS.xls')

Opening MS.xls
CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
DATEMODE: datemode 0
Countries: (1, 1)

Colour indexes used:
[]

NOTE *** sheet 0 (u'Sheet1'): DIMENSIONS R,C = 26,9 should be 23,9

LibreOffice 4.2 xlsx file LibreOffice 4.2 xlsx文件

>>> testing_xls.ReadSpreadsheet('Libre.xlsx')

Opening Libre.xlsx
ZIP component_names:
[u'_rels/.rels',
 u'docProps/app.xml',
 u'docProps/core.xml',
 u'xl/_rels/workbook.xml.rels',
 u'xl/sharedStrings.xml',
 u'xl/worksheets/_rels/sheet1.xml.rels',
 u'xl/worksheets/sheet1.xml',
 u'xl/styles.xml',
 u'xl/workbook.xml',
 u'[Content_Types].xml']

LibreOffice 4.2 ODS file LibreOffice 4.2 ODS文件

>>> testing_xls.ReadSpreadsheet('Libre.ods')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "testing_xls.py", line 6, in ReadSpreadsheet
    wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
  File "/usr/local/lib/python2.7/dist-packages/xlrd/__init__.py", line 422, in open_workbook
    raise XLRDError('Openoffice.org ODS file; not supported')
xlrd.biffh.XLRDError: Openoffice.org ODS file; not supported

[Nothing written to log file.] [未写入日志文件。]

I suppose I could catch the XLRDError and return ODS , or read the log file and return XLSX if component_names is found, and return XLS if codepage is found. 我想我可以捕获XLRDError并返回ODS ,或者读取日志文件并在找到component_names情况下返回XLSX ,并在找到codepage情况下返回XLS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM