繁体   English   中英

解析python目录中的每个文件?

[英]Parsing every file in a directory in python?

所以我有这段代码:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

for segment in root.iter("s"):
    for word in segment.iter("w"):
        print word.text,
    print "\n"

这将解析xml文件test.xml并打印已解析的输出。 但是,我有大量需要在目录中解析的这些xml文件。 如何修改代码,使其遍历目录中的每个文件并对其应用此功能?

谢谢!

这应该工作:

def printParsed(filename):
    tree = ET.parse(filename)
    root = tree.getroot()

    for segment in root.iter("s"):
        for word in segment.iter("w"):
            print word.text,
        print "\n"

if __name__ == "__main__":
    from os import listdir
    from os.path import isfile, join
    mypath ='path/to/your/xml/files'
    onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
    for f in onlyfiles:
        # only does stuff if the file ends in xml
        if f[-3:] = '.xml':
            printParsed(f)

您可以将文件保存为parser.py ,然后像python parser.py一样运行它。 如果需要,还可以删除if __name__ == "__main__"部分。

使用os.listdir(path)

它返回目录中所有文件的列表。

码:

import xml.etree.ElementTree as ET
import os
listofxml = os.listdir("./")
    for xml in listofxml:
        tree = ET.parse(xml)
        root = tree.getroot()

        for segment in root.iter("s"):
                for word in segment.iter("w"):
                        print word.text,
                print "\n"

如果不是所有文件都是xml,则可以拆分并检查:

import xml.etree.ElementTree as ET
import os
listofxml = os.listdir("./")
    for xml in listofxml:
        format = xml.split('.')
        if format[-1] == 'xml':
            tree = ET.parse(xml)
            root = tree.getroot()

            for segment in root.iter("s"):
                    for word in segment.iter("w"):
                            print word.text,
                    print "\n"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM