way to read multiple .txt files in multiple folder categories in python

Question

I am new to Python and am trying to read a dataset of .txt files stored in multiple folder hierarchies. The structure of the folders is

-Folder1 
   -Category1_Folder
        -file1.txt
   -Category2_Folder
        -file1.txt
        -file2.txt and so on...

The categories hold significance. I need to be able to identify which file is from which category. I then need to remove stop words and perform feature extraction with TfIDf. What is the easiest way to do something like this?

Answer 1

I recommend os.walk .

If you have dirs like:

project/
- folder1/
  - file1.png
  - file2.jpg
- folder2/
  - file3.zip

Then, example code is:

import os

for dirpath, dirnames, filenames in os.walk(os.getcwd()):  # getcwd() for current work dir
  print(dirpath, dirnames, filenames)

Output comes:

/project ['folder1', 'folder2'] []
/project/folder1 [] ['file1.png', 'file2.jpg']
/project/folder2 [] ['file3.zip']

If you need the folder, file name, use for loop:

for dirname in dirnames:
  for filename in filenames:
    # split dirname for categories
    # and so on..

way to read multiple .txt files in multiple folder categories in python

Question

1 answers

solution1
0 ACCPTED 2019-02-12 04:38:02

way to read multiple .txt files in multiple folder categories in python

Question

1 answers

solution1 0 ACCPTED 2019-02-12 04:38:02

solution1
0 ACCPTED 2019-02-12 04:38:02