Python：不同（excel）文件名，相同内容检查

Question

问：使用Python，如何测试两个名称不同的Excel文件是否具有相同的内容？

我尝试过的事情：我见过的大多数答案都建议使用filecmp.cmp或hash。 我曾尝试同时使用这两种方法，但均未成功。 特别是，假设“ f1.xlsx”只有两个非空单元格：A1 =“ hello”和B1 =“ world”。 接下来，将内容复制并粘贴到新文件“ f2.xlsx”。 现在，两个文件在相同的确切单元格位置中恰好有两个非空条目。 我得到以下结果：

>> f1 = 'f1.xlsx'
>> f2 = 'f2.xlsx'

#Using read():
>>> open(f1).read()==open(f2).read()
False

#Using filecmp.cmp:
>>> filecmp.cmp(f1, f2, shallow=True)
False

#Using izip:
>>> all(line1 == line2 for line1, line2 in izip_longest(f1, f2))
False

#Using hash:
>>> hash1=hashlib.md5()
>>> hash1.update(f1)
>>> hash1 = hash1.hexdigest()
>>> hash2=hashlib.md5()
>>> hash2.update(f2)
>>> hash2 = hash2.hexdigest()
>>> hash1==hash2
False

#also note, using getsize:
>>> os.path.getsize(f1)
8007
>>> os.path.getsize(f2)
8031

当然，我可以使用Pandas将Excel文件解释为数据帧，然后使用诸如all（）之类的标准比较返回True，但是我希望会有更好的方法，例如，也可以在.docx文件中使用。

提前致谢！ 我怀疑这是在“标准”测试中使用.xlsx或.docx等扩展名的难题，但是希望仍然存在一种比较内容的有效方法。

注意：如果它简化了问题，顺序就没有关系，因此，如果f2的A1 ='world'和B1 ='hello'，我希望返回“ True”。

Answer 1

过去我遇到过同样的问题，最后我进行了“逐行”比较。 对于excel文件，我使用openpyxl模块，该模块具有出色的界面，可以逐个单元地浏览文件。 对于docx，我使用了python_docx模块。 以下代码对我有用：

>>> from openpyxl import load_workbook
>>> from docx import Document

>>> f1 = Document('testDoc.docx')
>>> f2 = Document('testDoc.docx')
>>> wb1 = load_workbook('testBook.xlsx')
>>> wb2 = load_workbook('testBook.xlsx')
>>> s1 = wb1.get_active_sheet()
>>> s2 = wb2.get_active_sheet()

>>> def comp_xl(s1, s2):
>>>    for row1, row2 in zip(s1.rows, s2.rows):
>>>         for cell_1, cell_2 in zip(row1, row2):
>>>             if isinstance(cell_1, openpyxl.cell.cell.MergedCell):
>>>                 continue
>>>             elif not cell_1.value == cell_2.value:
>>>                 return False
>>>    return True

>>> comp_xl(s1, s2)
True
>>> all(cell_1.value==cell_2.value for cell_1, cell_2 in zip((row for row in s1.rows), (row for row in s2.rows)) if isinstance(cell_1, openpyxl.cell.cell.Cell)) 
True

>>> def comp_docx(f1, f2):
>>>     p1 = f1.paragraphs
>>>     p2 = f2.paragraphs
>>>     for i in range(len(p1)):
>>>         if p1[i].text == p2[i].text:
>>>             continue
>>>         else: return False
>>>     return True

>>> comp_docx(f1, f2)
True
>>> all(line1.text == line2.text for line1, line2 in zip(f1.paragraphs, f2.paragraphs))
True

它是非常基本的，显然不考虑样式或格式，但是仅用于测试两个文件的文本内容即可。 希望这对某人有帮助。

Python：不同（excel）文件名，相同内容检查

问题描述

1 个解决方案

解决方案1
0 2019-03-27 10:25:26

Python：不同（excel）文件名，相同内容检查

问题描述

1 个解决方案

解决方案1 0 2019-03-27 10:25:26

解决方案1
0 2019-03-27 10:25:26