简体   繁体   中英

Looping through (and opening) specific types of files in folder?

I want to loop through files with a certain extension in a folder, in this case .txt, open the file, and print matches for a regex pattern. When I run my program however, it only prints results for one file out of the two in the folder:

Anthony is too cool for school. I Reported the criminal. I am Cool.

1: A, I, R, I, C

My second file contains the text:

Oh My initials are AK

And finally my code:

import re, os

Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0

for file in files:
    if '.txt' not in file:
        del files[files.index(file)]
        continue
    count += 1
    fileobj = open(os.path.join(filepath, file), 'r')
    filetext = fileobj.read()
    Matches = Regex.findall(filetext)
    print(str(count)+': ' +', '.join(Matches), end = ' ')
    fileobj.close()

Is there a way to loop through (and open) a list of files? Is it because I assign every File Object returned by open(os.path.join(filepath, file), 'r') to the same name fileobj ?

U can do as simple as this :(its just a loop through file)

import re, os

Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0

for file in files:
    if '.txt' in file:
        fileobj = open(os.path.join(filepath, file), 'r')
        filetext = fileobj.read()
        Matches = Regex.findall(filetext)
        print(str(count)+': ' +', '.join(Matches), end == ' ')
        fileobj.close()

The del is causing the problem. The for loop have no idea if you delete an element or not, so it always advances. There might be a hidden file in the directory, and it is the first element in the files. After it got deleted, the for loop skips one of the files and then reads the second one. To verify, you can print out the files and the file at the beginning of each loop. In short, removing the del line should solve the problem .

If this is a standalone script, bash might be more clean:

count=0
for file in "$1"/*.txt; do
    echo -n "${count}: $(grep -o '[A-Z]' "$file" | tr "\n" ",") "
    ((count++))
done

glob module will help you much more since you want to read files with specific extension.

You can directly get list of files with extension "txt" ie you saved one 'if' construct.

More info on glob module .

Code will be less and more readable.

import glob

for file_name in glob.glob(r'C:/Users/dinesh_pundkar\Desktop/*.txt'):
    with open(file_name,'r') as f:
        text = f.read()
        """
        After this you can add code for Regex matching,
        which will match pattern in file text.
        """

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM