简体   繁体   English

使用Python解析文件夹中除要在XML文件中键入的文件以外的所有文件

[英]Parse all files in a folder except ones being typed in an XML file, using Python

Newbie programmer with a Python situation for you. 新手程序员,适合您的Python环境。

What I have: 我有的:

  1. a folder which contains, well, other folders (modules) and files (may it be .txt, .c, .h, .py, etc. ) 一个文件夹,其中包含其他文件夹(模块)和文件(可能是.txt,.c,.h,.py等)
  2. an XML file which basically contains the configuration of that folder ( Module name, Short name, but also an Exclude List. Those from Exclude List must not be taken in consideration ) 一个XML文件,该文件基本上包含该文件夹的配置(模块名称,短名称以及一个排除列表。不得考虑排除列表中的那些内容)

What I intend to do : 我打算做什么:

  • read the information from the XML file and save it in a matter that helps me parse properly 从XML文件中读取信息并将其保存在有助于我正确解析的问题上
  • parse all files from the given folder except the ones being excluded 解析给定文件夹中的所有文件,但排除的文件除外

My code so far looks like this: 到目前为止,我的代码如下所示:

<?xml version="1.0"?>
<Modules>
    <Module>
        <Name>MOD_Test1</Name>
        <Shortname>1</Shortname>
        <ExcludeList>
            <File>HeaderFile.h</File>
            <File>CFile.c</File>
        </ExcludeList>
    </Module>
    <Module>
        <Name>MOD_Test2</Name>
        <Shortname>2</Shortname>
        <ExcludeList>
            <File>TextFile.txt</File>
        </ExcludeList>
    </Module>
</Modules>

That's obviously the XML file 那显然是XML文件

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
        if ExceptFileName is not None:        
            ListOfExceptFiles.append(ExceptFileName) 

This one would read the XML file and gives me the list of files that must be excluded. 这将读取XML文件,并为我提供必须排除的文件列表。 This does the job, but poorly. 这可以完成工作,但是效果很差。 Let's say two modules have a file that has exactly the same name, one is excluded and one's not. 假设两个模块的文件名完全相同,一个文件被排除,另一个文件没有。 They'll both be skipped. 它们都将被跳过。

def Parse(walk_dir):
print('walk_dir = ' + walk_dir)
for root, subdirs, files in os.walk(walk_dir):
    print('-------------------------------------------------------\nroot = ' + root)
    for filename in files:
        with open(os.path.join(root, filename), 'r') as src:
            Text = src.read()
            print ('\nFile %s contains: \n' %filename) + Text

Now for parsing this is what I've started with. 现在开始解析,这就是我开始的内容。 It does not parse, I know, but once I can read the content of the file then I can certainly do other things too. 我知道它不会解析,但是一旦我可以读取文件的内容,那么我当然也可以做其他事情。

As for the removing excepted files part all I did was adding an IF statement to the 2nd FOR 至于删除例外文件部分,我所做的就是在第二个FOR中添加IF语句

for filename in files:
        if filename not in ListOfExceptFile:
            with open(os.path.join(root, filename), 'r') as src:

These are the two things that it doesn't do right : 这是它做对的两件事:

  1. files of same name will damage the output. 相同名称的文件将损坏输出。
  2. having more than one except files in the xml (for one module) will result in only last one being skipped. 在xml中有一个以上的文件(一个模块)除外,这将导致仅最后一个被跳过。 ( in my example HeaderFile.h will not be skipped and CFile.c will ) (在我的示例中,HeaderFile.h将不会被跳过,而CFile.c将会被)

EDIT: @bracco23 's answer got me thinking and though I haven't succeeded in mapping multiple lists with the module name as a key (still looking for help in this matter if you can) 编辑:@ bracco23的答案让我开始思考,尽管我没有成功映射以模块名称为键的多个列表(如果可以的话,仍在寻求帮助)
This is what I've got starting from the idea of list of lists: 这是我从列表列表的想法开始的:

def ReadConfig(Tuples = []):
tree = ET.ElementTree(file='Config.xml')
Modules = tree.getroot()
for Module in Modules:
    for Attribute in Module:
        if Attribute.tag=='Name':
            ModuleName = Attribute.text
        for File in Attribute:
            ExceptFileName = File.text
            Tuple = (ModuleName, ExceptFileName)
            Tuples.append(Tuple)

Is it a good way of approaching? 这是一种好方法吗?

The work is quite good, there is just a minr list of tweaks that need to be addressed to solve the issues: 这项工作相当不错,这里仅列出了一些微不足道的调整措施,以解决这些问题:

1) In your GetExceptFiles(ListOfExceptFiles = []) you add the file to the list at the end of the for over Attribute . 1)在GetExceptFiles(ListOfExceptFiles = []) ,将文件添加到for over Attribute末尾的列表中。 This causes the fact that only the last file is added. 这导致仅添加最后一个文件的事实。 In you move the check inside the for over the files, all excluded files should be added to the list. 在将检查移到文件上方时,应将所有排除的文件添加到列表中。 ( A couple of tabs/spaces is enough ) (几个选项卡/空格就足够了)

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
                if ExceptFileName is not None:        
                    ListOfExceptFiles.append(ExceptFileName) 

Also, you assume that the tag of the attributes can only be Name , Shortname or ExcludeList . 此外,您还假设属性的标签只能是NameShortnameExcludeList While this is surely and will be the case, a malformed file will break your parsing. 虽然确实如此,但格式错误的文件会破坏您的解析。 Consider checking the tag property for all attrbiutes and issuing an error in case something is wrong. 考虑检查所有属性的标记属性,并在出现问题时发出错误。

2) I am assuming that the files with the same name are actually the same file shared among modules, which are excluded in some modules but not in all. 2)我假设具有相同名称的文件实际上是模块之间共享的同一文件,但在某些模块中却未在所有模块中将其排除。 If this is the case, then a list of excluded file is going to lose the information about which module the exclusion lsit belong to. 如果是这种情况,那么被排除文件的列表将丢失有关被排除文件属于哪个模块的信息。 Consider using a map of lists with the name of the module as a key, so that each module can have its own list of excluded files. 考虑使用以模块名称为键的列表列表,以便每个模块可以拥有自己的排除文件列表。

EDIT A way to use a dict (i'm mostly java oriented and this structure is called map in java, but in python is a dict ) could be: 编辑使用dict (我主要是面向Java的,在Java中将此结构称为map,但在python中是dict ),可以是:

def GetExceptFiles(DictOfExceptFiles = {}):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                if(ModuleName not in DictOfExceptFiles)
                    DictOfExceptFiles[ModuleName] = []
                DictOfExceptFiles[ModuleName].append(ExceptFileName)
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))

Pay attention that this assumes that ModuleName is already setted before the first file, which depends on the orded of the components, which is something XML does not guarantee. 请注意,这假设已在第一个文件之前设置了ModuleName,这取决于组件的顺序,这是XML无法保证的。 To solve this, i would move the name and the shortname from being a child tag to being an XML attribute of the module, like this: 为了解决这个问题,我将名称和简称从子标记移到了模块的XML属性,如下所示:

<Module name="ModuleName" shortName="short name">
    ...
</Module>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM