简体   繁体   中英

run python code on a folder of files at once

I have a script that extracts data elements from XML files. I would like to run this on a directory(folder) of XMLs rather than a single one. This is what I have so far:

from xml.dom import minidom
from datetime import *
import os
import glob

filename = glob.glob("*.xml")
f = open(filename)
for xml in f:
    print (xml)
    xmldoc = minidom.parse(xml)
    tcd = xmldoc.getElementsByTagName("QualityMeasureDocument")[0]
    sport = activitiesElement.attributes["root"]
    sportName = sport.value
    print (sportName)


I am getting this error:

Traceback (most recent call last):
File "C:/Python34/Scripts/process.py", line 7, in <module>
f = open(filename)
TypeError: invalid file: ['CMS9v2.xml', 'country_data.xml', 'test.xml']
activitiesElement = tcd.getElementsByTagName("id")[0]


It would be nice to make this into a function as well.

glob.glob returns a list of filenames. You are treating a list as file. try it this way

filenames = glob.glob("*.xml")
for filename in filenames:
     f = open(filename)
     ...

Extract your current parsing to a function:

def parsefile (filename):
    f = open(filename) 
    for xml in f: 
        print (xml) 
        xmldoc = minidom.parse(xml) 
        tcd = xmldoc.getElementsByTagName("QualityMeasureDocument")[0] 
        sport = activitiesElement.attributes["root"]
        sportName = sport.value 
        print (sportName)

Call it:

for file in glob.glob(*.xml):
    parsefile (file)

In general, all you need to change to make part of a python script a function is to indent it and add a line

def functionname (var1, var2... ):

Where var1 etc. are names defined earlier that it relies upon.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM