简体   繁体   中英

Python: Writing .gz compressed file from a list

i am trying to write the contents of a list into.gz compressed file using gzip module.. for that i am writing the.csv with contents of list, and compressing that into.gz format. So looking for a pretty straight approach without writing the contents into.csv in middle..

current python code [working]

import re, gzip, csv
from collections import defaultdict

X = [['Apple','x','x','x','x','x'], ['Orange','y','y','y','y','y'], ['Banana','y','y','y','y','y']]

with open('new.csv', "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerows(X)
        f.close()
Y = open('new.csv', "r").readlines()
b = defaultdict(list)
bkp_filter = ['Apple', 'Orange']
for x in Y:
    for bkp in bkp_filter:
        if re.search(fr'\b{bkp}\b', x):
            b[bkp].append(x)

for k, v in b.items():
    with gzip.open('newzip.gz', 'a') as zip:
        for y in v:
            zip.write(y.encode())
    zip.close()

list X has 3 sub list ie., Apple, Orange, Banana , and there is another filter (bkp_filter) list which has Apple, Orange . By using re search method, it is writing contents into.gz file with filtered items found in bkp_filter

output

newzip.gz
Apple,x,x,x,x,x
Orange,y,y,y,y,y

Challenge i am facing: i want to convert this code such a way that, it should write.gz file without writing any.csv file...ie, by reading list from 'X'

i have tries this..

#Y = open('new.csv', "r").readlines()
b = defaultdict(list)
bkp_filter = ['Apple', 'Orange']
for x in X:
    for bkp in bkp_filter:
        if re.search(fr'\b{bkp}\b', x):
            b[bkp].append(x)

getting error as TypeError: expected string or bytes-like object

after changing if re.search(fr'\b{bkp}\b', str(x)): i got new error `

    zip.write(y.encode())
AttributeError: 'list' object has no attribute 'encode'

then i tried changing zip.write(y.encode()) to just zip.write(y) .. getting below errror.

    zip.write(y)
  File "C:\Users\madmax\anaconda3\lib\gzip.py", line 260, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'list' 

expected output: same.gz file output but without writing the content into csv file.

please help.. thanks in advance.

The trouble is that your x is a list:

for x in X:
    for bkp in bkp_filter:
        if re.search(fr'\b{bkp}\b', x):
                                    ^^^ this is ["Bananas", "x", "x", "x", "x"]

change it to:

        if re.search(fr'\b{bkp}\b', x[0]):

Then the same problem is further on. y is a list again, :

        zip.write(y.encode())

You have more possibilities. If you know all members of y are strings, you can do:

y_line = ",".join(y)
zip.write(y_line.encode())

Otherwise this might work:

for cell in y:
    zip.write(str(cell).encode())
    zip.write("\n".encode())
zip.write("\n".encode())

You may open the gzip file in the text mode thus eliminating need for those encodes:

for k, v in b.items():
    with gzip.open('newzip.gz', 'at') as zip:
        for y in v:
            zip.write(",".join(y))
            zip.write("\n")

2 more notes:

  • When you use the with...open... as something, do not do close. The with constuct will make sure the file is properly closed when leaving context.

  • I believe in your example, the cycles for writing the gzip file should be the other way around. You are always opening the same gzip file, and just adding data. Just open it and write everything at once.

Like this:

with gzip.open('newzip.gz', 'at') as zip:
    for k, v in b.items():
        for y in v:
            zip.write(",".join(y))
            zip.write("\n")

And if the items are not strings:

with gzip.open('newzip.gz', 'at') as zip:
    for k, v in b.items():
        for y in v:
            for cell in y:
                zip.write(str(cell))
            zip.write("\n")

I am not absolutely sure if the following works fine, but the idea is that you can use csv writer to write directly into the "gzip" file. Something like this:

import re, gzip, csv
from collections import defaultdict

X = [['Apple','x','x','x','x','x'], ['Orange','y','y','y','y','y'], 
     ['Banana','y','y','y','y','y']]

with gzip.open('newzip.gz', 'at') as zip:
    #                         ^
    writer = csv.writer(zip)
    writer.writerows(X)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM