简体   繁体   English

如果缺少使用 python 的特定列,则删除 CSV 文件

[英]Delete CSV file if missing specific column using python

Currently my code looks into CSV files in a folder and replaces strings based on if the file has column 'PROD_NAME' in the data.目前,我的代码查看文件夹中的 CSV 文件,并根据文件在数据中是否包含“PROD_NAME”列来替换字符串。 If it doesnt have column 'PROD_NAME', I'm trying to delete those files in the folder.如果它没有列“PROD_NAME”,我正在尝试删除文件夹中的这些文件。 I can get my code to print which csv files do not have the column with a little debugging, but I cant figure out how to actually delete or remove them from the folder they are in. I have tried an if statement that calls os.remove() and still nothing happens.我可以让我的代码打印哪些 csv 文件没有经过一点调试的列,但我不知道如何从它们所在的文件夹中实际删除或删除它们。我尝试了一个调用 os.remove 的 if 语句() 仍然没有任何反应。 No errors or anything.. it just finishes the script with all the files still in the folder.没有错误或任何东西..它只是完成了所有文件仍在文件夹中的脚本。 Here is my code.这是我的代码。 Any help is appreciated.任何帮助表示赞赏。 Thanks!谢谢!

def worker():
    filenames = glob.glob(dest_dir + '\\*.csv')
    print("Finding all files with column PROD_NAME")
    time.sleep(3)
    print("Changing names of products in these tables...")
    for filename in filenames:
        
        my_file = Path(os.path.join(dest_dir, filename))
        
        try:
            with open(filename):
            # read data
                df1 = pd.read_csv(filename, skiprows=1, encoding='ISO-8859-1') # read column header only - to get the list of columns
                dtypes = {}
                for col in df1.columns:# make all columns text, to avoid formatting errors
                    dtypes[col] = 'str'
                df1 = pd.read_csv(filename, dtype=dtypes, skiprows=1, encoding='ISO-8859-1')

                if 'PROD_NAME' not in df1.columns:
                os.remove(filename)
                    
                #Replaces text in files
                if 'PROD_NAME' in df1.columns: 
                    df1 = df1.replace("NABVCI", "CLEAR_BV")
                    df1 = df1.replace("NAMVCI", "CLEAR_MV")
                    df1 = df1.replace("NA_NRF", "FA_GUAR")
                    df1 = df1.replace("N_FPFA", "FA_FLEX")
                    df1 = df1.replace("NAMRFT", "FA_SECURE_MVA")
                    df1 = df1.replace("NA_RFT", "FA_SECURE")
                    df1 = df1.replace("NSPFA7", "FA_PREFERRED")
                    df1 = df1.replace("N_ENHA", "FA_ENHANCE")
                    df1 = df1.replace("N_FPRA", "FA_FLEX_RETIRE")
                    df1 = df1.replace("N_SELF", "FA_SELECT")
                    df1 = df1.replace("N_SFAA", "FA_ADVANTAGE")
                    df1 = df1.replace("N_SPD1", "FA_SPD1")
                    df1 = df1.replace("N_SPD2", "FA_SPD2")
                    df1 = df1.replace("N_SPFA", "FA_LIFESTAGES")
                    df1 = df1.replace("N_SPPF", "FA_PLUS")
                    df1 = df1.replace("N__CFA", "FA_CHOICE")
                    df1 = df1.replace("N__OFA", "FA_OPTIMAL")
                    df1 = df1.replace("N_SCNI", "FA_SCNI")
                    df1 = df1.replace("NASCI_", "FA_SCI")
                    df1 = df1.replace("NASSCA", "FA_SSC")
                    df1.to_csv(filename, index=False, quotechar="'")            
                
        except:
            if 'PROD_NAME' in df1.columns:
                print("Could not find string to replace in this file: " + filename)
                    
worker()

Written below is a block of code that reads the raw csv data.下面是一段读取原始 csv 数据的代码块。 It extracts the first row of data (containing the column names) and looks for the column name PROD_NAME .它提取第一行数据(包含列名)并查找列名PROD_NAME If it finds it, it sets found to True .如果找到它,它会将found设置为True Else, it sets found to False .否则,它将found设置为False To prevent trying to delete the files whilst open, the removal is done outside of the open() .为了防止在打开时尝试删除文件,删除是在open()之外完成的。


import os

filename = "test.csv"

with open(filename) as f:
    if "PROD_NAME" in f.readlines()[0].split(","):
        print("found")
        found = True
    else:
        print("not found")
        found = False
if not found:
    os.remove(filename)
else:
    pass#Carry out replacements here/load it in pandas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM