简体   繁体   中英

Accessing files inside folders in a zipfile

I would like to access files(xml files) in a zip file in order to do some filtering on them. But how can I go far into the folders in the zip file to access files? My problem is that I can not access files by zip_file.namelist If they are in some folders, here is my code:

import sys, getopt
from lxml import etree
from io import StringIO
import zipfile

def main(argv):

    inputfile = ''
    outputfile = ''
    try:
       opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])
    except getopt.GetoptError:
       print 'test.py -i <inputfile> -o <outputfile>'
       sys.exit(2)
    for opt, arg in opts:
       if opt == '-h':
          print 'test.py -i <inputfile> -o <outputfile>'
          sys.exit()
       elif opt in ("-i", "--ifile"):
          inputfile = arg
       elif opt in ("-o", "--ofile"):
          outputfile = arg

    archive = zipfile.ZipFile(inputfile, 'r')

    with archive as zip_file:
      for file in zip_file.namelist():
          if file.endswith(".amd"):
              try:

                  print("Process the file")
                  xslt_root = etree.XML('''\
                    <xsl:stylesheet version="1.0"
                    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

                    <xsl:template match="node() | @*">
                      <xsl:copy>
                          <xsl:apply-templates select="node() | @*"/>
                      </xsl:copy>
                    </xsl:template>


                    <xsl:template match="TimeStamp"/>
                    <xsl:template match="@timeStamp"/>
                    <xsl:template match="TimeStamps"/>
                    <xsl:template match="Signature"/>

                    </xsl:stylesheet>
                    ''')

                  transform = etree.XSLT(xslt_root)

                  doc = etree.parse(zip_file.open(file))
                  result_tree = transform(doc)

                  resultfile = unicode(str(result_tree))
                  zip_file.write(resultfile)


              finally:
                  zip_file.close()

if __name__=='__main__':
     main(sys.argv[1:])

Exception: It can not read "ex4_linktime/" as this is a folder and not a file!

 File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile     (src\lxml\lxml.etree.c:96832)
 File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:91290)
 File "parser.pxi", line 683, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:92476)
 File "parser.pxi", line 620, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:91737)
 IOError: Error reading file 'ex4_linktime/': failed to load external entity "ex4_linktime/"

Exception 2: It does not write back the changed file!

 File "C:\Python27\lib\zipfile.py", line 1033, in write
    st = os.stat(filename)
WindowsErrorProcess the file
: [Error 3] The system cannot find the path specified: u'<?xml version="1.0"?   >\n<ComponentData toolVersion="V6.1.4" schemaVersion="6.1.0.0">\n\t<DataSet name="Bank1">...
  • When you do etree.parse(file) , file is just a string. etree doesn't know that it has to go search in the zip file for that name, it will just look in the current directory. Try:

     doc = etree.parse(zip_file.open(file)) 
  • You also have to skip over directory names -- these will have a trailing slash:

     for filename in zip_file.namelist(): if filename.endswith('/'): # skip directory names continue 
  • To update the zip file, use:

     zip_file.writestr(filename, resultfile) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM