简体   繁体   中英

Parse all files in a folder except ones being typed in an XML file, using Python

Newbie programmer with a Python situation for you.

What I have:

  1. a folder which contains, well, other folders (modules) and files (may it be .txt, .c, .h, .py, etc. )
  2. an XML file which basically contains the configuration of that folder ( Module name, Short name, but also an Exclude List. Those from Exclude List must not be taken in consideration )

What I intend to do :

  • read the information from the XML file and save it in a matter that helps me parse properly
  • parse all files from the given folder except the ones being excluded

My code so far looks like this:

<?xml version="1.0"?>
<Modules>
    <Module>
        <Name>MOD_Test1</Name>
        <Shortname>1</Shortname>
        <ExcludeList>
            <File>HeaderFile.h</File>
            <File>CFile.c</File>
        </ExcludeList>
    </Module>
    <Module>
        <Name>MOD_Test2</Name>
        <Shortname>2</Shortname>
        <ExcludeList>
            <File>TextFile.txt</File>
        </ExcludeList>
    </Module>
</Modules>

That's obviously the XML file

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
        if ExceptFileName is not None:        
            ListOfExceptFiles.append(ExceptFileName) 

This one would read the XML file and gives me the list of files that must be excluded. This does the job, but poorly. Let's say two modules have a file that has exactly the same name, one is excluded and one's not. They'll both be skipped.

def Parse(walk_dir):
print('walk_dir = ' + walk_dir)
for root, subdirs, files in os.walk(walk_dir):
    print('-------------------------------------------------------\nroot = ' + root)
    for filename in files:
        with open(os.path.join(root, filename), 'r') as src:
            Text = src.read()
            print ('\nFile %s contains: \n' %filename) + Text

Now for parsing this is what I've started with. It does not parse, I know, but once I can read the content of the file then I can certainly do other things too.

As for the removing excepted files part all I did was adding an IF statement to the 2nd FOR

for filename in files:
        if filename not in ListOfExceptFile:
            with open(os.path.join(root, filename), 'r') as src:

These are the two things that it doesn't do right :

  1. files of same name will damage the output.
  2. having more than one except files in the xml (for one module) will result in only last one being skipped. ( in my example HeaderFile.h will not be skipped and CFile.c will )

EDIT: @bracco23 's answer got me thinking and though I haven't succeeded in mapping multiple lists with the module name as a key (still looking for help in this matter if you can)
This is what I've got starting from the idea of list of lists:

def ReadConfig(Tuples = []):
tree = ET.ElementTree(file='Config.xml')
Modules = tree.getroot()
for Module in Modules:
    for Attribute in Module:
        if Attribute.tag=='Name':
            ModuleName = Attribute.text
        for File in Attribute:
            ExceptFileName = File.text
            Tuple = (ModuleName, ExceptFileName)
            Tuples.append(Tuple)

Is it a good way of approaching?

The work is quite good, there is just a minr list of tweaks that need to be addressed to solve the issues:

1) In your GetExceptFiles(ListOfExceptFiles = []) you add the file to the list at the end of the for over Attribute . This causes the fact that only the last file is added. In you move the check inside the for over the files, all excluded files should be added to the list. ( A couple of tabs/spaces is enough )

def GetExceptFiles(ListOfExceptFiles = []):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))
                if ExceptFileName is not None:        
                    ListOfExceptFiles.append(ExceptFileName) 

Also, you assume that the tag of the attributes can only be Name , Shortname or ExcludeList . While this is surely and will be the case, a malformed file will break your parsing. Consider checking the tag property for all attrbiutes and issuing an error in case something is wrong.

2) I am assuming that the files with the same name are actually the same file shared among modules, which are excluded in some modules but not in all. If this is the case, then a list of excluded file is going to lose the information about which module the exclusion lsit belong to. Consider using a map of lists with the name of the module as a key, so that each module can have its own list of excluded files.

EDIT A way to use a dict (i'm mostly java oriented and this structure is called map in java, but in python is a dict ) could be:

def GetExceptFiles(DictOfExceptFiles = {}):
    tree = ET.ElementTree(file='Config.xml')
    Modules = tree.getroot()
    for Module in Modules:
        for Attribute in Module:
            if Attribute.tag=='Name':
                ModuleName = Attribute.text
            if Attribute.tag=='Shortname':
                ModuleShortName = Attribute.text
            for File in Attribute:
                ExceptFileName = File.text
                if(ModuleName not in DictOfExceptFiles)
                    DictOfExceptFiles[ModuleName] = []
                DictOfExceptFiles[ModuleName].append(ExceptFileName)
                print ('In module {} we must exclude {}'.format(ModuleName, ExceptFileName))

Pay attention that this assumes that ModuleName is already setted before the first file, which depends on the orded of the components, which is something XML does not guarantee. To solve this, i would move the name and the shortname from being a child tag to being an XML attribute of the module, like this:

<Module name="ModuleName" shortName="short name">
    ...
</Module>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM