简体   繁体   中英

gzip write newline to file python

I am trying to write an iterable of tuples to a file using gzip in python. But when I write the new line character (\\n)

For example:
   if the iterable of tuples is like this: [(1,2,3) , (4,5)]
   the output file should be : 1,2,3
                               4,5

   but I got: 1,2,34,5



   I dont know where is my newline character gone!!!

   Here is my code: 
      fi = gzip.open(filename, "wb")
      for tup in data:
        fi.write(','.join(str(x) for x in tup).encode("utf-8"))
        fi.write("\n".encode("utf-8"))
      fi.close()

I can only assume that there is a problem with the way you are reading or displaying the uncompressed data? I tried this following code on Windows and Linux (Python 2.7) and it did work:

import gzip

filename = 'gzipout.gz'
data =  [(1,2,3) , (4,5)]
fi = gzip.open(filename, 'wb')
for tup in data:
    fi.write(','.join(str(x) for x in tup).encode("utf-8"))
    fi.write('\n'.encode("utf-8"))
fi.close()

fi = gzip.open(filename, 'rb')
unzipdata = fi.read()
print unzipdata
fi.close()

The output was:

1,2,3
4,5

This code simply gzips the contents to a file and then reads back the compressed data and dumps it to the console as is. The newline is present.

If I use gunzip gzipout.gz it extracts to gzipout and if I display the contents the newline is also present.

Your behavior isn't uncommon especially if you use an old brain dead program to open the uncompressed text file. In the *nix world an end of line (EOL) is generally denoted by \\n . In Windows EOL is represented by two characters \\r\\n . Python has a universal mode for writing text so that it automatically converts \\n to whatever EOL is on the platform. Unfortunately GZIP still doesn't seem to honor that flag with Python 2.7. This means that even if you opened a GZIP file for writing with mode "U" (Text mode + universal) no translation is done on each write.

If you are on a Windows platform and targeting Windows users then you might consider the non-portable solution of explicitly writing '\\r\\n' so that brain-dead editors like Notepad will render properly. I am guessing that something like this would yield the results you are looking for:

for tup in data:
    fi.write(','.join(str(x) for x in tup).encode("utf-8"))
    fi.write('\r\n'.encode("utf-8")) # notice I use \r\n instead of \n
fi.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM