Breaking the loop properly in Python

Question

Currently I am trying to upload a set of files via API call. The files have sequential names: part0.xml, part1.xml, etc. It loops through all the files and uploads them properly, but it seems it doesn't break the loop and after it uploads the last available file in the directory I am getting an error:

No such file or directory.

And I don't really understand how to make it stop as soon as the last file in the directory is uploaded. Probably it a very dumb question, but I am really lost. How do I stop it from looping through non-existent files?

The code:

part = 0
with open('part%d.xml' % part, 'rb') as xml:

    #here goes the API call code

part +=1

I also tried something like this:

import glob
part = 0
for fname in glob.glob('*.xml'):
    with open('part%d.xml' % part, 'rb') as xml:

        #here goes the API call code

    part += 1

Edit: Thank you all for the answers, learned a lot. Still lots to learn. :)

Answer 1

Alternatively, you can simply use a regex.

import os, re
files = [f for f in os.listdir() if re.search(r'part[\d]+\.xml$', f)]
for f in files:
  #process..

This will be really useful in case you require advanced filtering.

Note: you can do similar filtering using list returned by glob.glob()

If you are not familiar with the list comprehension and regex, I would recommend you to refer to:

Regex - howto
List Comprehensions

Answer 2

You almost had it. This is your code with some stuff removed:

import glob

for fname in glob.glob('part*.xml'):
    with open(fname, 'rb') as xml:
        # here goes the API call code

It is possible to make the glob more specific, but as it is it solves the "foo.xml" problem. The key is to not use counters in Python; the idiomatic iteration is for x in y: and you don't need a counter.

glob will return the filenames in alphabetical order so you don't even have to worry about that, however remember that ['part1', 'part10', 'part2'] sort in that order. There are a few ways to cope with that but it would be a separate question.

Answer 3

Consider what happens if there are other files that match the '*.xml'

suppose that you have 11 files "part0.xml"..."part10.xml" but also a file called "foo.xml"

Then the for loop will iterate 12 times (since there are 12 matches for the glob). On the 12th iteration, you are trying to open "part11.xml" which doesn't exist.

On approach is to dump the glob and just handle the exception.

part = 0
while True:
    try:
        with open('part%d.xml' % part, 'rb') as xml:

            #here goes the API call code

        part += 1
    except IOerror:
        break

Answer 4

When you use a counter, you need to test, if the file exists:

import os
from itertools import count

for part in count():
    filename = 'part%d.xml' % part
    if not os.path.exists(filename):
        break
    with open(filename) as inp:
        # do something

Answer 5

Your for loop is saying "for every file that ends with .xml "; if you have any file that ends with .xml that isn't a sequential part%d.xml , you're going to get an error. Imagine you have part0.xml and foo.xml . The for loop is going to loop twice; on the second loop, it's going to try to open part1.xml , which doesn't exist.

Since you know the filenames already, you don't even need to use glob.glob() ; just check if each file exists before opening it, until you find one that doesn't exist.

import os

from itertools import count


filenames = ('part%d.xml' % part_num for part_num in count())

for filename in filenames:
    if os.path.exists(filename):
        with open(filename, 'rb') as xmlfile:
            do_stuff(xml_file)
            # here goes the API call code
    else:
        break

If for any reason you're worried about files disappearing between os.path.exists(filename) and open(filename, 'rb') , this code is more robust:

import os

from itertools import count


filenames = ('part%d.xml' % part_num for part_num in count())

for filename in filenames:
    try:
        xmlfile = open(filename, 'rb')
    except IOError:
        break
    else:
        with xmlfile:
            do_stuff(xmlfile)
            # here goes the API call code

Answer 6

You are doing it wrong. Suppose folder has 3 files- part0.xml part1.xml and foo.xml. So loop will iterate 3 times and it will give error for third iteration, it will try to open part2.xml, which is not present.

Don't loop through all files with extension .xml.

Only Loop through files which start with 'part', have a digit in the name before the extension and having extension .xml

So your code will look like this:

import glob

for fname in glob.glob('part*[0-9].xml'):
    with open(fname, 'rb') as xml:
        #here goes the API call code

Read - glob – Filename pattern matching

If you want files to be uploaded in sequential order then read : String Natural Sort

Breaking the loop properly in Python

Question

6 answers

solution1
2 2015-08-31 04:56:52

solution2
2 2015-08-31 05:10:01

solution3
1 2015-08-31 04:27:14

solution4
1 2015-08-31 04:28:19

solution5
1 ACCPTED 2015-08-31 04:30:29

solution6
1 2015-08-31 05:33:07

Breaking the loop properly in Python

Question

6 answers

solution1 2 2015-08-31 04:56:52

solution2 2 2015-08-31 05:10:01

solution3 1 2015-08-31 04:27:14

solution4 1 2015-08-31 04:28:19

solution5 1 ACCPTED 2015-08-31 04:30:29

solution6 1 2015-08-31 05:33:07

solution1
2 2015-08-31 04:56:52

solution2
2 2015-08-31 05:10:01

solution3
1 2015-08-31 04:27:14

solution4
1 2015-08-31 04:28:19

solution5
1 ACCPTED 2015-08-31 04:30:29

solution6
1 2015-08-31 05:33:07