简体   繁体   English

读取PHPExcel修改的.xls文件时xlrd崩溃

[英]xlrd crashes when reading .xls file modified by PHPExcel

I'm having an extremely difficult time editing some Excel files using PHP and python. 我使用PHP和python编辑一些Excel文件非常困难。

I originally did everything in PHP using PHPExcel, but I was processing very large files and PHPExcel crashed when it ran out of memory. 我最初使用PHPExcel在PHP中做了所有事情,但我处理的是非常大的文件,当内存不足时PHPExcel崩溃了。 So I changed it to do some work using PHP and do the rest using python. 所以我把它改成了用PHP做一些工作,然后用python做其余的工作。

So the process is: 所以这个过程是:

  • Parse xml posted to PHP script 解析xml发布到PHP脚本
  • Insert rows into Excel (.xls) file based on xml data 根据xml数据将行插入Excel(.xls)文件
  • Pass (.xls) file and xml data to python script to populate the spreadsheet 将(.xls)文件和xml数据传递给python脚本以填充电子表格
  • ex. 恩。 python upload.py Example.xls data.xml called by PHP python upload.py Example.xls data.xml由PHP调用的python upload.py Example.xls data.xml
  • python script uses xlrd, xlwt and xlutils to populate Excel file python脚本使用xlrd,xlwt和xlutils来填充Excel文件

The problem I'm having is that if the python script modifies a regular .xls file that I created by hand, it works perfectly. 我遇到的问题是,如果python脚本修改了我手工创建的常规.xls文件,它可以很好地工作。 But once PHP excel modifies the Excel file, the python script produces the following error: 但是一旦PHP excel修改了Excel文件,python脚本就会产生以下错误:

_locate_stream(Workbook): seen
  0  5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
 20  4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
100= 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
120  4 4 4 4 4 4 4 4 4 4 4 4 4 3 2 2 
File "upload.py", line 63, in <module>
workbook_readonly = open_workbook(excel,formatting_info=True,on_demand=True)
File "/home/student/eamorde/public_html/dining/xlrd/__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 87, in open_workbook_xls
ragged_rows=ragged_rows,
File "/home/student/eamorde/public_html/dining/xlrd/book.py", line 619, in biff2_8_load
cd.locate_named_stream(UNICODE_LITERAL(qname))
File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 390, in locate_named_stream
d.tot_size, qname, d.DID+6)
File "/home/student/eamorde/public_html/dining/xlrd/compdoc.py", line 418, in _locate_stream
raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))
xlrd.compdoc.CompDocError: Workbook corruption: seen[2] == 4

So I dug through the source code of xlrd and found the line that is producing the error: 所以我挖掘了xlrd的源代码,发现产生错误的行:

def _locate_stream(self, mem, base, sat, sec_size, start_sid, expected_stream_size, qname, seen_id):
    # print >> self.logfile, "_locate_stream", base, sec_size, start_sid, expected_stream_size
    s = start_sid
    if s < 0:
        raise CompDocError("_locate_stream: start_sid (%d) is -ve" % start_sid)
    p = -99 # dummy previous SID
    start_pos = -9999
    end_pos = -8888
    slices = []
    tot_found = 0
    found_limit = (expected_stream_size + sec_size - 1) // sec_size
    while s >= 0:
        if self.seen[s]:
            print("_locate_stream(%s): seen" % qname, file=self.logfile); dump_list(self.seen, 20, self.logfile)
            raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))

The last line is the one raising the exception: 最后一行是提出异常的那一行:

raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s]))

Can anyone explain this? 有谁能解释一下? The file is not corrupted in that opening it in Excel works fine, but xlrd seems to be unable to read it. 该文件在Excel中打开时没有损坏它工作正常,但xlrd似乎无法读取它。

My PHP script does the following (rough sketch): 我的PHP脚本执行以下操作(粗略草图):

$phpExcel = new PHPExcel();
$file = "MyFile.xls";
$reader = new PHPExcel_Reader_Excel5();
$phpExcel = $reader->load($file);
//(... insert rows based on xml)
$writer = new PHPExcel_Writer_Excel5();
$writer->save('MyFile.xls');
exec("python upload.py MyFile.xls data.xml");

If anyone knows why this might be happening or even a better solution to my problems (PHPExcel memory issues) it would be greatly appreciated. 如果有人知道为什么会发生这种情况,甚至更好地解决我的问题(PHPExcel内存问题),我们将不胜感激。

Edit: The source code for the file that's raising the error can be found here . 编辑:可以在此处找到引发错误的文件的源代码。

Edit: I created an example, basically took my excel file and removed any identifying information. 编辑:我创建了一个示例,基本上采用了我的excel文件并删除了任何识别信息。 To try it yourself, see the gist here . 要亲自尝试,请在此处查看要点。

I got same error with one of my .xls files (excel can open them just fine, but xlrd fails). 我的一个.xls文件出现了同样的错误(excel可以打开它们就好了,但是xlrd失败了)。 As I guess Compdoc.seen array keeps track of already read "FAT" sectors. 因为我猜Compdoc.seen数组跟踪已经读取的“FAT”扇区。 In my case Root Entry reading block (SSCS) gets all that sectors marked as seen, leading to exception raise in future. 在我的情况下,根条目读取块(SSCS)获取标记为所见的所有扇区,导致将来异常提升。 U can try to find the bug in sectors reading logic and contribute to xlrd :) or just comment this lines with exception raise which will likely solve problem in your case (As did in mine) and wait for xlrd update. 你可以尝试在读取逻辑的扇区中找到错误并为xlrd做贡献:)或者只是通过异常提升来评论这一行,这可能会解决你的问题(就像我的一样)并等待xlrd更新。

对于需要远程服务数据的绑定中的任何人,这是一个黑客攻击,但是注释掉第419行( raise CompDocError("%s corruption: seen[%d] == %d" % (qname, s, self.seen[s])) )在compdoc.py似乎工作正常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM