简体   繁体   中英

Python read files in directory and concatenate

I want to write a Python script that searches all folders in the current directory, looks for all .txt files, and creates a file that is a concatenation of all those files (in any order) in the current directory. If folders have subfolders, it should not search those subfolders. An example is

main_folder
  folder_1
    sub_folder
      file1.txt
    file2.txt
  folder_2
    file3.txt

The script is placed inside main_folder . It should create a file that is a concatenation of file2.txt and file3.txt (in any order) inside main_folder .

My question is: How can I tell Python to traverse through the folders, look for .txt files, without going into the subfolders?

Use glob :

>>> import glob
>>> glob.glob('main_folder/*/*.txt')
['main_folder/folder_1/file2.txt', 'main_folder/folder_2/file3.txt']

From the docs :

When topdown is True , the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames ; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

Use the os.walk() function with the topdown option set to True. The delete all the dirnames in the returned tuple:

for root, dirs, files in os.walk('/base/dir', topdown=True):
    del dirs[:]
...

If you're on a Unix based system .. I think the easiest way would be to use find and cat using the python subprocess module, like this. With find the -depth option really helps, for example, use -depth +3 to find all files at a depth >= 3

>>> import subprocess
>>> lst = subprocess.check_output('find . -name "*.txt" -depth 2', shell=True)
>>> print lst
./folder_1/f1.txt
./folder_2/f2.txt 
>>> out_f = subprocess.check_output('cat '+lst.replace('\n', ' '), shell=True)
>>> print out_f
inside Folder_1
inside Folder 2

You can avoid the shell=True by providing each argument separately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM