I'm trying to create multiple good/bad files from original .csv
files from a directory.
Im fairly new to Python, but have cobbled together the below, but it's not saving multiple files, just x1 "good" and x1 "bad" file. in the dir i have testfile1
and testfile2
. the output should be testfile1good testfile1bad testfile2good testfile2bad
.
Any help would be greatly appreciated.
Thanks
import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path
files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')
for f in files:
filename = []
filename = Path(f)
#Can not be null fields
df = pd.read_csv(f)
emptyvals = []
emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()
#Bank Account Number is not 8 digits long
accountnolen = []
ac = []
accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
ac = df[(df['AccNumLen'] != 8)]
acd= ac.drop(['AccNumLen'],axis=1)
#Create Exclusions
allexclusions = []
allexclusions = df[emptyvals].append(acd)
allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)
#GoodList
#for f in files:
# filename = []
# filename = Path(f)
origlist = df
df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
cl = df[(df['_merge'] == 'left_only')]
cld = cl.drop(['_merge','AccNumLen'],axis=1)
cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)
cld.to_csv(filename.stem+'good.csv',header =True,index=False)
i think you do loop but leave it and do the rest on line 14 - there you have filename set and you save your data once.
What you want is do the loop and the rest should happen for each iteration, so code should look like this:
import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path
files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')
for f in files:
filename = []
filename = Path(f)
#EDIT: we stay in loop and process each file one by one with following lines:
#Can not be null fields
df = pd.read_csv(f)
emptyvals = []
emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()
#Bank Account Number is not 8 digits long
accountnolen = []
ac = []
accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
ac = df[(df['AccNumLen'] != 8)]
acd= ac.drop(['AccNumLen'],axis=1)
#Create Exclusions
allexclusions = []
allexclusions = df[emptyvals].append(acd)
allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)
#GoodList
#for f in files:
# filename = []
# filename = Path(f)
origlist = df
df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
cl = df[(df['_merge'] == 'left_only')]
cld = cl.drop(['_merge','AccNumLen'],axis=1)
cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)
cld.to_csv(filename.stem+'good.csv',header =True,index=False)
In another words - you iterate over file names found in directory and THEN you take last "filename" and process it in one pass. By adding 4 spaces to rest of code we say to python interpreter that this part of code is part of loop and should be executed for each file. Hope it makes sense
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.