简体   繁体   中英

Java - possible to modify and parse gzipped xml files without unzipping?

I have an arraylist of gzipped xml files. Is it possible to view and manipulate the contents of these xml files all without unzipping them and taking up disk space? If so, what would be the correct class(es) to use for this task?

I know I can create a gzipinputstream from a fileinputstream of the zip file but from there I'm not sure what to do. I have only this written:

 GZIPInputStream in = new GZIPInputStream(new FileInputStream(zippedFiles.get(i)));

I need some way to parse text within the xml files and modify the xml itself but again, extracting all of them would take up too much disk space.

What exactly are you going to achieve? You can extract the file into memory using a ByteArrayOutputStream and convert it into a byte-Array that you forward to your XML parser library (converting it to String and passing that is not recommended as the encoding is specified inside the XML file itself and the conversion to String must therefore be done by the XML parser internally). Most XML parsers also support reading directly from any InputStream , so you could pass yours directly to it which will probably further reduce your memory consumption. Disk space will only be occupied when writing data back to it by simply reversing the described procedure. Still, as you directly replace the source file by overwriting it, there is nowhere any disk space wasted.

The fact that they're in a list doesn't change much, but no.

Ignoring compression, files are stored linearly on disks. You can append to them cheaply, you can replace bytes cheaply, but you can't replace sequences of different lengths (like replace("Testing Procedure Specification", "TPS") ) without rewriting the file after the modified substring.

Gziping the file complicates things, but the same rule applies. In general, making arbitrary modifications to a file requires rewriting the file.

Your code for reading the files is on the right track, though. You can easily read through gziped files as streams and without having to decompress the entire file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM