使用Python解析文件夹中除要在XML文件中键入的文件以外的所有文件

Question

新手程序员，适合您的Python环境。

我有的：

一个文件夹，其中包含其他文件夹（模块）和文件（可能是.txt，.c，.h，.py等）
一个XML文件，该文件基本上包含该文件夹的配置（模块名称，短名称以及一个排除列表。不得考虑排除列表中的那些内容）

我打算做什么：

从XML文件中读取信息并将其保存在有助于我正确解析的问题上
解析给定文件夹中的所有文件，但排除的文件除外

到目前为止，我的代码如下所示：

<?xml version="1.0"?>
<Modules>
    <Module>
        <Name>MOD_Test1</Name>
        <Shortname>1</Shortname>
        <ExcludeList>
            <File>HeaderFile.h</File>
            <File>CFile.c</File>
        </ExcludeList>
    </Module>
    <Module>
        <Name>MOD_Test2</Name>
        <Shortname>2</Shortname>
        <ExcludeList>
            <File>TextFile.txt</File>
        </ExcludeList>
    </Module>
</Modules>

那显然是XML文件

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
        if ExceptFileName is not None:        
            ListOfExceptFiles.append(ExceptFileName)

这将读取XML文件，并为我提供必须排除的文件列表。 这可以完成工作，但是效果很差。 假设两个模块的文件名完全相同，一个文件被排除，另一个文件没有。 它们都将被跳过。

def Parse(walk_dir):
print('walk_dir = ' + walk_dir)
for root, subdirs, files in os.walk(walk_dir):
    print('-------------------------------------------------------\nroot = ' + root)
    for filename in files:
        with open(os.path.join(root, filename), 'r') as src:
            Text = src.read()
            print ('\nFile %s contains: \n' %filename) + Text

现在开始解析，这就是我开始的内容。 我知道它不会解析，但是一旦我可以读取文件的内容，那么我当然也可以做其他事情。

至于删除例外文件部分，我所做的就是在第二个FOR中添加IF语句

for filename in files:
        if filename not in ListOfExceptFile:
            with open(os.path.join(root, filename), 'r') as src:

这是它做对的两件事：

相同名称的文件将损坏输出。
在xml中有一个以上的文件（一个模块）除外，这将导致仅最后一个被跳过。 （在我的示例中，HeaderFile.h将不会被跳过，而CFile.c将会被）

编辑：@ bracco23的答案让我开始思考，尽管我没有成功映射以模块名称为键的多个列表（如果可以的话，仍在寻求帮助）
这是我从列表列表的想法开始的：

def ReadConfig(Tuples = []):
tree = ET.ElementTree(file='Config.xml')
Modules = tree.getroot()
for Module in Modules:
    for Attribute in Module:
        if Attribute.tag=='Name':
            ModuleName = Attribute.text
        for File in Attribute:
            ExceptFileName = File.text
            Tuple = (ModuleName, ExceptFileName)
            Tuples.append(Tuple)

这是一种好方法吗？

Answer 1

这项工作相当不错，这里仅列出了一些微不足道的调整措施，以解决这些问题：

1）在GetExceptFiles(ListOfExceptFiles = []) ，将文件添加到for over Attribute末尾的列表中。 这导致仅添加最后一个文件的事实。 在将检查移到文件上方时，应将所有排除的文件添加到列表中。 （几个选项卡/空格就足够了）

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
                if ExceptFileName is not None:        
                    ListOfExceptFiles.append(ExceptFileName)

此外，您还假设属性的标签只能是Name ， Shortname或ExcludeList 。 虽然确实如此，但格式错误的文件会破坏您的解析。 考虑检查所有属性的标记属性，并在出现问题时发出错误。

2）我假设具有相同名称的文件实际上是模块之间共享的同一文件，但在某些模块中却未在所有模块中将其排除。 如果是这种情况，那么被排除文件的列表将丢失有关被排除文件属于哪个模块的信息。 考虑使用以模块名称为键的列表列表，以便每个模块可以拥有自己的排除文件列表。

编辑使用dict （我主要是面向Java的，在Java中将此结构称为map，但在python中是dict ），可以是：

def GetExceptFiles(DictOfExceptFiles = {}):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                if(ModuleName not in DictOfExceptFiles)
                    DictOfExceptFiles[ModuleName] = []
                DictOfExceptFiles[ModuleName].append(ExceptFileName)
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))

请注意，这假设已在第一个文件之前设置了ModuleName，这取决于组件的顺序，这是XML无法保证的。 为了解决这个问题，我将名称和简称从子标记移到了模块的XML属性，如下所示：

<Module name="ModuleName" shortName="short name">
    ...
</Module>

使用Python解析文件夹中除要在XML文件中键入的文件以外的所有文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-08-22 08:46:23

使用Python解析文件夹中除要在XML文件中键入的文件以外的所有文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-08-22 08:46:23

解决方案1
0 已采纳 2017-08-22 08:46:23