How to retrieve a single 7zip file without extracting all of it in Python3.x?

Question

In Python, I want to brose all the sub directory and extract a 7z file and check its content. I do not want to extract all the files but I should be able to peep into the content iteratively/ recursively.

The main concern is the .7z zip is of size 15 GB but when it is unzipped it is 225 GB. Now my hard disk is 160 GB. Of those 225 GB I might need only valid 60 GB data. I can search for that only if I can go through the data in the individual file. Is there any os.walk kind of function on .7z file ?

https://dumps.wikimedia.org/other/static_html_dumps/current/en/*.7z is the file, I am exploring.

7z l *.7z

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (406E3),ASM,AES-NI)

Scanning the drive for archives:
1 file, 15363543213 bytes (15 GiB)

Listing archive: wikipedia-en-html.tar.7z

--
Path = wikipedia-en-html.tar.7z
Type = 7z
Physical Size = 15363543213
Headers Size = 100
Method = LZMA:22
Solid = -
Blocks = 1

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2008-06-18 23:32:15 ..... 223674511360  15363543113  wikipedia-en-html.tar
------------------- ----- ------------ ------------  ------------------------
2008-06-18 23:32:15       223674511360  15363543113  1 files

import lzma

f7file = r"C:\Users\padmaraj.bhat\OneDrive - Accenture\Downloads\wiki-html\wikipedia-en-html.tar.7z"

f = lzma.open(f7file, 'rb')
for line in f:
    lzma.decompress(line)
    break

Traceback (most recent call last)
  <ipython-input-5-d1a496a0c194> in <module>()
      4 
      5 f = lzma.open(f7file, 'rb')
----> 6 for line in f:
      7     lzma.decompress(line)
      8     break

  ~\AppData\Local\Continuum\anaconda3\lib\lzma.py in readline(self, size)
    220         """
    221         self._check_can_read()
--> 222         return self._buffer.readline(size)
    223 
    224     def write(self, data):

  ~\AppData\Local\Continuum\anaconda3\lib\_compression.py in readinto(self, b)
     66     def readinto(self, b):
     67         with memoryview(b) as view, view.cast("B") as byte_view:
---> 68             data = self.read(len(byte_view))
     69             byte_view[:len(data)] = data
     70         return len(data)

  ~\AppData\Local\Continuum\anaconda3\lib\_compression.py in read(self, size)
    101                 else:
    102                     rawblock = b""
--> 103                 data = self._decompressor.decompress(rawblock, size)
    104             if data:
    105                 break

LZMAError: Input format not supported by decoder

Answer 1

When I had to do something like that, I had to call the 7z CLI via subprocess() . In this way, you can determine file lists as well as file contents from the archive.

For example, for extracting files directly to stdout, you can use the -so option .

How to retrieve a single 7zip file without extracting all of it in Python3.x?

Question

1 answers

solution1
0 2019-01-12 09:04:24

How to retrieve a single 7zip file without extracting all of it in Python3.x?

Question

1 answers

solution1 0 2019-01-12 09:04:24

solution1
0 2019-01-12 09:04:24