简体   繁体   中英

List files that exist and do not exist in a given directory and its sub-directories

I have a csv file with a set of filenames which I would like to check if they exist or not in a directory and its sub-directories.

List of files in CSV:

List of files
0    add_even_blank_page_with_text.py
1                    add_even_page.py
2                     add_text_pdf.py
3              waste_data_cleaning.py
4                            hello.py
5                              111.py

I have a written a script that works, see below:

#Import Packages
import os
import pandas as pd
import csv

path=r'C:\Users\sarah\.spyder-py3'
file=r'C:\Users\sarah\.spyder-py3\list.csv'
new=r'C:\Users\sarah\.spyder-py3\output90.csv'

#Read in CSV File
list=pd.read_csv(file, header=None,skiprows=[0], dtype=str, names=['File'],usecols=[0], squeeze=True)
print(list)

# Create a workbook and add a worksheet.
f=open(new, 'w', newline='')
writer = csv.writer(f)

#Check if each file exists or not
for root, dirs, files in os.walk(path):
    for files in list:
        dir=os.path.join(root, files)
        if os.path.exists(dir):
            print(dir,'- exists')
            exists=dir+' -exists'
            writer.writerow([exists])
        else:
            print(dir,'- not exists')
            notexists=dir+'not exists'
            writer.writerow([notexists])

#Ouput results to csv
#file.close()

However the output lists a full path for folder + files in the folder and directory and says if the file exists or not so there are 100+ rows in my excel file.

C:\Users\sarah\.spyder-py3\add_even_blank_page_with_text.py - exists
C:\Users\sarah\.spyder-py3\add_even_page.py - exists
C:\Users\sarah\.spyder-py3\add_text_pdf.py - exists
C:\Users\sarah\.spyder-py3\waste_data_cleaning.py - exists
C:\Users\sarah\.spyder-py3\hello.py - not exists
C:\Users\sarah\.spyder-py3\111.py - not exists
C:\Users\sarah\.spyder-py3\.pylint.d\add_even_blank_page_with_text.py - not exists
C:\Users\sarah\.spyder-py3\.pylint.d\add_even_page.py - not exists
C:\Users\sarah\.spyder-py3\.pylint.d\add_text_pdf.py - not exists
C:\Users\sarah\.spyder-py3\.pylint.d\waste_data_cleaning.py - not exists

However I would like to format the list so it lists the filename, its corresponding path which would bring back only 5 rows.

add_even_blank_page_with_text.py        <FullFilepath> exist
add_even_page.py                        <FullFilepath> exist
add_text_pdf.py                         <FullFilepath> exist
waste_data_cleaning.py                  <FullFilepath> exist
hello.py                                Not exist
111.py                                  Not exist

Would anyone be able to help as to how to format this? I seem to have gone round in circles on this one. Thanks in advance.

You can use os.path.basename(<full_file_name_here>) function to get your base filename. Save them to a list, and sort them before you print/save.

Secondly, you can read all files in a folder and then check if your file is there or not. The reason being os.path.isdir is IO operation, costlier if there are not too many files in that dir, like few hundred files.

From what I understand, you want to keep track of whether you've seen each file. You can use a set to do that, then print it at the end. Here's a basic example which keeps track of the first occurrence, no formatting or anything fancy:

import os

path = r'C:\Users\sarah\.spyder-py3'

to_find = {
    'add_even_blank_page_with_text.py',
    'add_even_page.py',
    'add_text_pdf.py',
    'waste_data_cleaning.py',
    'hello.py',
    '111.py',
    }

found = set()

for root, dirs, files in os.walk(path):
    if not to_find:  # If none left to find, stop looking
        break

    # Files we're searching for that are in the current "root"
    for file in to_find & set(files):
        found.add((root, file))
        to_find.remove(file)

for root, file in found:
    print('+', os.path.join(root, file))

for file in to_find:
    print('-', file)

Output should be like this:

+ C:\Users\sarah\.spyder-py3\add_even_blank_page_with_text.py
+ C:\Users\sarah\.spyder-py3\add_even_page.py
+ C:\Users\sarah\.spyder-py3\add_text_pdf.py
+ C:\Users\sarah\.spyder-py3\waste_data_cleaning.py
- hello.py
- 111.py

BTW, avoid variable names like list since it shadows the builtin list . On the same note, you overwrite files in your inner loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM