How do I check if files in my folder coincide with the file names specified in my .csv file?

Question

I am trying to build a method that will check if the file names in my.csv file will match the file names in my actual file folder. If they don't match, I want to delete the whole row of on my.csv file. Here is what I have tried so far:

dir_path = Path('D:\audio_files')
    
csv_file_path = Path('D:\metadata.csv') 

lines = list()
files = list()

for f in os.listdir(dir_path):
    f = f.strip('.wav')
    files.append(str(f))

with open(csv_file_path, 'r') as read_file:
    reader = csv.reader(read_file)
    for row in reader:
        lines.append(row)
        for field in row:
            for f in files:
                if field != f:
                    print("Line Removed.")
                    lines.remove(row)

However, I keep getting this error:

Traceback (most recent call last):
File "file_checker.py", line 26, in <module>
lines.remove(row)
ValueError: list.remove(x): x not in list

What is it I should fix to get it to work?

EDIT:

Here is a small sample of my.csv file. It's very straight forward. First column contains file names without the extension, and second column contains the labels of the filenames.

fname	label
236421	Male_speech
124818	Female_speech
426906	Male_speech

And so on.

I am basically trying to match the names in the fname column to the ones in my file folder (with extension .wav ) and if the names do not exist in the file folder, the delete the row of the non-existent file name.

EDIT #2:

I managed to solve the problem with a bit of local help. Here is the final product:

dir_path = 'D:\audio'

csv_file_path = 'D:\original.csv'

#create a new file that contains the fnames on the cvs file that match the file names in my file folder
csv_new_file = open('D:\new.csv', 'w', newline="")

# create a writer variable that will allow me to write rows in my new csv file
csv_write = csv.writer(csv_new_file, delimiter=',', quotechar='"')

# "i" variable will allow me to write the headers from the original csv file
i = 0
with open(csv_file_path, 'r') as read_file:
    reader = csv.reader(read_file, delimiter=',', quotechar='"')
    for row in reader:
#If the row is the very first, the write it as is (headers)
        if i == 0:
            csv_write.writerow(row)  
            i += 1
            continue
#Check if the file path for my audio files with .wav extension exists and the write the row of the original csv in my new csv
        file_path = dir_path + '/' + row[0] + '.wav'
        if os.path.exists(file_path):
            csv_write.writerow(row)

#IMPORTANT to close files once finished!
csv_new_file.close()
read_file.close()

Answer 1

Consider this block:

for f in files:
    if field != f:
        lines.remove(row)

What that says is if the value of field is not equal to the value of f remove it. Well if files is a list of files unless the very first element in the list matches the value of fields it will be removed and the iteration will continue after that element has already been removed.

Instead I recommend making files as set and checking for membership in the set

dir_path = Path('D:\audio_files')
    
csv_file_path = Path('D:\metadata.csv') 

lines = list()
files = set()

for f in os.listdir(dir_path):
    f = f.strip('.wav')
    files.add(str(f))

with open(csv_file_path, 'r') as read_file:
    reader = csv.reader(read_file)
    for row in reader:
        lines.append(row)
        for field in row:
            if field not in files:
                lines.remove(row)
                continue

I would personally split those loops up and build the list of rows up then iterate over a copy removing elements but that may just by personal preference.

How do I check if files in my folder coincide with the file names specified in my .csv file?

Question

1 answers

solution1
0 2021-02-06 01:22:12

How do I check if files in my folder coincide with the file names specified in my .csv file?

Question

1 answers

solution1 0 2021-02-06 01:22:12

solution1
0 2021-02-06 01:22:12