简体   繁体   中英

Comparing two text files inside zip files using python

I want to compare two text files with same name and same relative path inside two different zip files using python.

I have been trying to search various ways and found none of the top solutions available work in my case.

My code:

from zipfile import ZipFile
from pathlib import Path

with ZipFile(zip_path1) as z1, ZipFile(zip_path2) as z2:
    file1_paths = [Path(filepath) for filepath in z1.namelist()]
    file12_paths = [Path(filepath) for filepath in z12.namelist()]
    cmn = list(set(file1_paths ).intersection(set(file12_paths )))
    common_files = [filepath for filepath in cmn if str(filepath).endswith(('.txt', '.sh'))]

    for f in common_files:
        with z1.open(f, 'r') as f1, z2.open(f, 'r') as f2:
            if f1.read() != f2.read(): # Also used io.TextIOWrapper(f1).read() here
                print('Difference found for {filepath}'.format(filepath=str(f))

Note:

I have used pathlib for the paths here. In the line with z1.open(f, 'r')... if I use pathlib paths instead of hardcoding the path, I am getting <class 'KeyError'>: "There is no item named WindowsPath('SomeFolder/somefile.txt') in the archive" .

Moreover, even if I hardcode the path, the file read buffer that is coming for comparison is always coming empty. So the comparison is not actually working in this case.

I am stuck in this curious case and any help is much appreciated.

You should be able to achieve this without using Path , since the paths are specific to the zipfile and don't need to be treated in an os-specific way. The strings returned by namelist() can be used for both comparison and as arguments to open() as follows:

from zipfile import ZipFile

with ZipFile(zip_path1) as z1, ZipFile(zip_path2) as z2:
    common_files = [x for x in set(z1.namelist()).intersection(set(z2.namelist())) if x.endswith('.txt') or x.endswith('.sh')]
    # print(common_files)

    for f in common_files:
        with z1.open(f) as f1, z2.open(f) as f2:
            if f1.read() != f2.read():
                print('Difference found for {filepath}'.format(filepath=str(f)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM