In Python, I want to brose all the sub directory and extract a 7z file and check its content. I do not want to extract all the files but I should be able to peep into the content iteratively/ recursively.
The main concern is the .7z zip is of size 15 GB but when it is unzipped it is 225 GB. Now my hard disk is 160 GB. Of those 225 GB I might need only valid 60 GB data. I can search for that only if I can go through the data in the individual file. Is there any os.walk kind of function on .7z file ?
https://dumps.wikimedia.org/other/static_html_dumps/current/en/*.7z
is the file, I am exploring.
7z l *.7z
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (406E3),ASM,AES-NI)
Scanning the drive for archives:
1 file, 15363543213 bytes (15 GiB)
Listing archive: wikipedia-en-html.tar.7z
--
Path = wikipedia-en-html.tar.7z
Type = 7z
Physical Size = 15363543213
Headers Size = 100
Method = LZMA:22
Solid = -
Blocks = 1
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2008-06-18 23:32:15 ..... 223674511360 15363543113 wikipedia-en-html.tar
------------------- ----- ------------ ------------ ------------------------
2008-06-18 23:32:15 223674511360 15363543113 1 files
import lzma
f7file = r"C:\Users\padmaraj.bhat\OneDrive - Accenture\Downloads\wiki-html\wikipedia-en-html.tar.7z"
f = lzma.open(f7file, 'rb')
for line in f:
lzma.decompress(line)
break
Traceback (most recent call last)
<ipython-input-5-d1a496a0c194> in <module>()
4
5 f = lzma.open(f7file, 'rb')
----> 6 for line in f:
7 lzma.decompress(line)
8 break
~\AppData\Local\Continuum\anaconda3\lib\lzma.py in readline(self, size)
220 """
221 self._check_can_read()
--> 222 return self._buffer.readline(size)
223
224 def write(self, data):
~\AppData\Local\Continuum\anaconda3\lib\_compression.py in readinto(self, b)
66 def readinto(self, b):
67 with memoryview(b) as view, view.cast("B") as byte_view:
---> 68 data = self.read(len(byte_view))
69 byte_view[:len(data)] = data
70 return len(data)
~\AppData\Local\Continuum\anaconda3\lib\_compression.py in read(self, size)
101 else:
102 rawblock = b""
--> 103 data = self._decompressor.decompress(rawblock, size)
104 if data:
105 break
LZMAError: Input format not supported by decoder
When I had to do something like that, I had to call the 7z
CLI via subprocess()
. In this way, you can determine file lists as well as file contents from the archive.
For example, for extracting files directly to stdout, you can use the -so
option .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.