[英]Delete CSV file if missing specific column using python
Currently my code looks into CSV files in a folder and replaces strings based on if the file has column 'PROD_NAME' in the data.目前,我的代码查看文件夹中的 CSV 文件,并根据文件在数据中是否包含“PROD_NAME”列来替换字符串。 If it doesnt have column 'PROD_NAME', I'm trying to delete those files in the folder.如果它没有列“PROD_NAME”,我正在尝试删除文件夹中的这些文件。 I can get my code to print which csv files do not have the column with a little debugging, but I cant figure out how to actually delete or remove them from the folder they are in. I have tried an if statement that calls os.remove() and still nothing happens.我可以让我的代码打印哪些 csv 文件没有经过一点调试的列,但我不知道如何从它们所在的文件夹中实际删除或删除它们。我尝试了一个调用 os.remove 的 if 语句() 仍然没有任何反应。 No errors or anything.. it just finishes the script with all the files still in the folder.没有错误或任何东西..它只是完成了所有文件仍在文件夹中的脚本。 Here is my code.这是我的代码。 Any help is appreciated.任何帮助表示赞赏。 Thanks!谢谢!
def worker():
filenames = glob.glob(dest_dir + '\\*.csv')
print("Finding all files with column PROD_NAME")
time.sleep(3)
print("Changing names of products in these tables...")
for filename in filenames:
my_file = Path(os.path.join(dest_dir, filename))
try:
with open(filename):
# read data
df1 = pd.read_csv(filename, skiprows=1, encoding='ISO-8859-1') # read column header only - to get the list of columns
dtypes = {}
for col in df1.columns:# make all columns text, to avoid formatting errors
dtypes[col] = 'str'
df1 = pd.read_csv(filename, dtype=dtypes, skiprows=1, encoding='ISO-8859-1')
if 'PROD_NAME' not in df1.columns:
os.remove(filename)
#Replaces text in files
if 'PROD_NAME' in df1.columns:
df1 = df1.replace("NABVCI", "CLEAR_BV")
df1 = df1.replace("NAMVCI", "CLEAR_MV")
df1 = df1.replace("NA_NRF", "FA_GUAR")
df1 = df1.replace("N_FPFA", "FA_FLEX")
df1 = df1.replace("NAMRFT", "FA_SECURE_MVA")
df1 = df1.replace("NA_RFT", "FA_SECURE")
df1 = df1.replace("NSPFA7", "FA_PREFERRED")
df1 = df1.replace("N_ENHA", "FA_ENHANCE")
df1 = df1.replace("N_FPRA", "FA_FLEX_RETIRE")
df1 = df1.replace("N_SELF", "FA_SELECT")
df1 = df1.replace("N_SFAA", "FA_ADVANTAGE")
df1 = df1.replace("N_SPD1", "FA_SPD1")
df1 = df1.replace("N_SPD2", "FA_SPD2")
df1 = df1.replace("N_SPFA", "FA_LIFESTAGES")
df1 = df1.replace("N_SPPF", "FA_PLUS")
df1 = df1.replace("N__CFA", "FA_CHOICE")
df1 = df1.replace("N__OFA", "FA_OPTIMAL")
df1 = df1.replace("N_SCNI", "FA_SCNI")
df1 = df1.replace("NASCI_", "FA_SCI")
df1 = df1.replace("NASSCA", "FA_SSC")
df1.to_csv(filename, index=False, quotechar="'")
except:
if 'PROD_NAME' in df1.columns:
print("Could not find string to replace in this file: " + filename)
worker()
Written below is a block of code that reads the raw csv data.下面是一段读取原始 csv 数据的代码块。 It extracts the first row of data (containing the column names) and looks for the column name PROD_NAME
.它提取第一行数据(包含列名)并查找列名PROD_NAME
。 If it finds it, it sets found
to True
.如果找到它,它会将found
设置为True
。 Else, it sets found
to False
.否则,它将found
设置为False
。 To prevent trying to delete the files whilst open, the removal is done outside of the open()
.为了防止在打开时尝试删除文件,删除是在open()
之外完成的。
import os
filename = "test.csv"
with open(filename) as f:
if "PROD_NAME" in f.readlines()[0].split(","):
print("found")
found = True
else:
print("not found")
found = False
if not found:
os.remove(filename)
else:
pass#Carry out replacements here/load it in pandas
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.