Why is my glob.glob loop not iterating through all text files in folder?

Question

I am attempting to read from a folder containing text documents with python 3. Specifically, this is a modification of the LingSpam email spam dataset. I am expecting the code I wrote to return all 1893 text document names, however, the code instead returns the first 420 filenames. I do not understand why it is stopping short of the total number of filenames. Any ideas?

if not os.path.exists('train'):  # download data
  from urllib.request import urlretrieve
  import tarfile
  urlretrieve('http://cs.iit.edu/~culotta/cs429/lingspam.tgz', 'lingspam.tgz')
  tar = tarfile.open('lingspam.tgz')
  tar.extractall()
  tar.close()
abc = []
for f in glob.glob("train/*.txt"):
  print(f)
  abc.append(f)
print(len(abc))

I've tried changing the glob params but still no success.

Edit: Apparently my code works for everyone but me. Here's my output

Answer 1

Success! The problem was

if not os.path.exists('train'):  # download data

To check my output, I had actually downloaded the files onto my computer, and since this line checked whether or not the folder existed, and it did exist, it caused issues. I deleted the files off of my machine and now it works as it should, though I suspect running

  from urllib.request import urlretrieve
  import tarfile
  urlretrieve('http://cs.iit.edu/~culotta/cs429/lingspam.tgz', 'lingspam.tgz')
  tar = tarfile.open('lingspam.tgz')
  tar.extractall()
  tar.close()

without the if statement would have had the same result.

Why is my glob.glob loop not iterating through all text files in folder?

Question

1 answers

solution1
0 ACCPTED 2016-03-30 20:50:10

Why is my glob.glob loop not iterating through all text files in folder?

Question

1 answers

solution1 0 ACCPTED 2016-03-30 20:50:10

solution1
0 ACCPTED 2016-03-30 20:50:10