简体   繁体   中英

How to Return in python, reading multiple .xml files

I'm writing a script in Python that will go through a folder and Subfolder, reads only XML files with more than 100 files. If i hard code this code outside the function it reads all 100 XML files in temp0, however if i put this code inside the function and use return, function always returns only one 1 file, I mean it reads only one file. Can anybody please explain why "return" works this way? Thanks in advance.

def raw_input(doc):
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                # print(perpX_)
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                temp0 = str("|".join(outputX_))
                #print(temp0)
                return temp0

doc=os.walk('./data/')
raw_input(doc)

temp0 = raw_input(doc)
print(temp0)

return returns the function result, so as soon as return is reached, Python exits the function and takes the result of the expression next to return as an output of a function.

You've got your return inside a for loop, which means that it will be reached on every iteration, but Python interpreter assumes temp0 to be the final result of your function call, so it quits.

You could return multiple values in a list, eg, like this:

def raw_input(doc):
    result = []    # this is where your output will be aggregated
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                # print(perpX_)
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                # We append the output for current file to the list
                result.append(str("|".join(outputX_)))
    # And now we return our string, at the end of the function.
    # AFTER the for loops
    return '|'.join(result)

doc=os.walk('./data/')

temp0 = raw_input(doc)
print(temp0)

This way, you'll get your outputs as a single string.

Also, there's such a thing as generator . A generator is an object that can be iterated. You can make your code evaluate lazily (on demand):

# now raw_input is a generator
def raw_input(doc):
    # we don't need a storage now
    for root, dirs, packs in doc:
        for files in packs:
            if files == 'abc.xml':
                filename = os.path.join(root, files)
                open_file = open(filename, 'r')
                perpX_ = open_file.read()
                outputX_ = re.compile('<test (.*?)</text>', re.DOTALL | re.IGNORECASE).findall(perpX_)
                # now we yield current value and function temporary stops its evaluation
                yield str("|".join(outputX_))

doc=os.walk('./data/')
results = raw_input(doc)
# now results is a generator. It is not evaluated yet
# you can get first output like this:
first_out = next(results)
# and then the second:
second_out = next(results)
# or iterate over it, just like over a casual list:
for res in results:
    print(res)
# note that it will iterate only over next values
# (excluding first and second ones, since it doesn't have access to them anymore)

# and now res is empty (we've reached the end of generator)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM