If Statement Condition Met but Does Not Execute (Python)

Question

HI I have a list of windows path objects which I am running an if statement on. Background: I have several csv files. My code checks these csv files. If csv file is good, the script moves the file to a dir called "archive". If there is an error its moved to "error", if its empty it goes to "empty".

So I have a file that's been moved to archive. I copied this file back over to base dir for the script to process it. However the if statement that is supposed to catch this duplicate doesn't execute and instead the script tries to move the file to the archive dir. When this happens, becaue I am using the Path.rename() method to move my files, I get the following error: FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv' -> 'C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\archive\06_17_2020_FMGN520.csv'

These are the functions involved. Anyone know why this is happening?:

def make_dict_of_csvprocessing_dirs():
    dir_dict = process_dirconfig_file(dirconfig_file)
    # print(dir_dict)
    dictofpdir_flist = {} #dictionary of lists of files in different processing dirs
    csvbase_file_dir = dir_dict["base_dir"]
    csvhistory_Phandler = Path(csvbase_file_dir)
    csvbase_path_list = [file for file in csvhistory_Phandler.glob("*.*")]
    dictofpdir_flist["csvbase_path_list"] = csvbase_path_list

    archive_dir = dir_dict["archive_dir"]
    archive_Phandler = Path(archive_dir)
    archivefiles_path_set = {file for file in archive_Phandler.rglob("*.*")}
    dictofpdir_flist["archivefiles_path_set"] = archivefiles_path_set

The function where error occurs:

def odf_history_from_csv_to_dbtable(db_instance):
    odfsdict = db_instance['odfs_tester_history']
    #table_row = {}
    totalresult_list = []

    dir_dict, dictofpdir_flist = make_dict_of_csvprocessing_dirs()
    print(dir_dict)
    csvbase_path_list = dictofpdir_flist["csvbase_path_list"]
    archivefiles_path_set = dictofpdir_flist["archivefiles_path_set"]

    for csv in csvbase_path_list:  # is there a faster way to compare the list of files in archive and history?
        if csv in archivefiles_path_set:
            print(csv.name + " is in archive folder already")
        else:
            csvhistoryfilelist_to_dbtable(csv, db_instance)
            df_tuple = process_csv_formatting(csv)
            df_cnum, odfscsv_df = df_tuple
            if df_cnum == 1:
                trg_path = Path(dir_dict['empty_dir'])
                csv.rename(trg_path.joinpath(csv.name))

    return totalresult_list

When I debug Pycharm gives me the following values: Notice how the ticks for the directory listing are reversed. I wonder if this is the issue?:

archivefiles_path_set={WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/archive/06_17_2020_FMGN520.csv')}

csv = {WindowsPath}C:\Users\sys_nsgprobeingestio\Documents\dozie\odfs\odfshistory\06_17_2020_FMGN520.csv

csvbase_path_list = 
[WindowsPath('C:/Users/sys_nsgprobeingestio/Documents/dozie/odfs/odfshistory/06_17_2020_FMGN520.csv')]

Answer 1

Propbably fastest way to get which files to copy (if you are the only process accessing both dirs):

from os import listdir 

basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")

def what_to_copy(frm_dir, to_dir):
    return set(os.listdir(frm_dir)).difference(os.listdir(to_dir))

copy_names = what_to_copy(basedir, archdir)
print(copy_names) # you need to prepend the dirs when copying, use os.path.join

It seems your code is quite complex (lots of storing stuff in dicts to transfer to get it out again) for that little of a task. This is how it could work:

import os

# boiler plate code to create files and make some of them already "archived"
names = [ f"file_{i}.csv" for i in range(10,60)]
basedir = r"c:/temp/csvs"
archdir = os.path.join(basedir,"temp")

os.makedirs(basedir, exist_ok = True)
os.makedirs(archdir, exist_ok = True)

def create_files():
    for idx, fn in enumerate(names):
        # create all files in basedir
        with open(os.path.join(basedir,fn),"w") as f:
            f.write(" ")
        # every 3rd file goes into archdir as well
        if idx%3 == 0:
            with open(os.path.join(archdir,fn),"w") as f:
                f.write(" ")


create_files()

Function to "copy" a file if not yet exists:

def copy_from_to_if_not_exists(frm,to):
    """'frm' full path to file, 'to' directory to copy to"""
    # norm paths so they compare equally regardless of C:/temp or C:\\temp
    frm = os.path.normpath(frm)
    to =  os.path.normpath(to)

    fn  = os.path.basename(frm)
    dir = os.path.dirname(frm)

    if dir != to:
        if fn in os.listdir(to):
            print(fn, " -> already exists!")
        else:
            # you would copy the file instead ...
            print(fn, " -> could be copied")

# print whats in the basedir as well as the archivedir (os.walk descends subdirs)
for root,dirs,files in os.walk(basedir):
    print(root + ":", files, sep="\n")

for file in os.listdir(basedir):
    copy_from_to_if_not_exists(os.path.join(basedir,file),archdir)

If the read cache optimization of your harddrive is not good enough for you, you can cache the result of os.listdir(to) but its probably fine as is.

Output:

c:/temp/csvs:
['file_10.csv','file_11.csv','file_12.csv','file_13.csv','file_14.csv','file_15.csv',
 'file_16.csv','file_17.csv','file_18.csv','file_19.csv','file_20.csv','file_21.csv',
 'file_22.csv','file_23.csv','file_24.csv','file_25.csv','file_26.csv','file_27.csv',
 'file_28.csv','file_29.csv','file_30.csv','file_31.csv','file_32.csv','file_33.csv',
 'file_34.csv','file_35.csv','file_36.csv','file_37.csv','file_38.csv','file_39.csv', 
 'file_40.csv','file_41.csv','file_42.csv','file_43.csv','file_44.csv','file_45.csv',
 'file_46.csv','file_47.csv','file_48.csv','file_49.csv','file_50.csv','file_51.csv', 
 'file_52.csv','file_53.csv','file_54.csv','file_55.csv','file_56.csv','file_57.csv',
 'file_58.csv','file_59.csv']

c:/temp/csvs\temp:
['file_10.csv','file_13.csv','file_16.csv','file_19.csv','file_22.csv','file_25.csv', 
 'file_28.csv','file_31.csv','file_34.csv','file_37.csv','file_40.csv','file_43.csv',
 'file_46.csv','file_49.csv','file_52.csv','file_55.csv','file_58.csv']

file_10.csv  -> already exists!
file_11.csv  -> could be copied
file_12.csv  -> could be copied
file_13.csv  -> already exists!
file_14.csv  -> could be copied
file_15.csv  -> could be copied
file_16.csv  -> already exists!
file_17.csv  -> could be copied
file_18.csv  -> could be copied
[...snipp...]
file_55.csv  -> already exists!
file_56.csv  -> could be copied
file_57.csv  -> could be copied
file_58.csv  -> already exists!
file_59.csv  -> could be copied

See lru_cache for ways to cache results of functions - and consider putting the os.listdir(archdir) into a function that caches the result if IO reading gets to be a bottleneck (measure first, then optimize)

If Statement Condition Met but Does Not Execute (Python)

Question

1 answers

solution1
1 2020-07-01 18:09:42

If Statement Condition Met but Does Not Execute (Python)

Question

1 answers

solution1 1 2020-07-01 18:09:42

solution1
1 2020-07-01 18:09:42