to_csv multiple dataframes from loop with filename

Question

I'm trying to create multiple good/bad files from original .csv files from a directory.

Im fairly new to Python, but have cobbled together the below, but it's not saving multiple files, just x1 "good" and x1 "bad" file. in the dir i have testfile1 and testfile2 . the output should be testfile1good testfile1bad testfile2good testfile2bad .

Any help would be greatly appreciated.

Thanks

import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path


files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')


for f in files:
    filename = []
    filename = Path(f)

#Can not be null fields    
df = pd.read_csv(f)
emptyvals = []
emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()

#Bank Account Number is not 8 digits long
accountnolen = []
ac = []
accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
ac =  df[(df['AccNumLen'] != 8)]
acd= ac.drop(['AccNumLen'],axis=1)

#Create Exclusions
allexclusions = []
allexclusions = df[emptyvals].append(acd)
allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)

#GoodList
#for f in files:
#    filename = []
#    filename = Path(f)
origlist = df
df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
cl =  df[(df['_merge'] == 'left_only')]
cld = cl.drop(['_merge','AccNumLen'],axis=1)
cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)

cld.to_csv(filename.stem+'good.csv',header =True,index=False)

Answer 1

i think you do loop but leave it and do the rest on line 14 - there you have filename set and you save your data once.

What you want is do the loop and the rest should happen for each iteration, so code should look like this:

import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path


files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')


for f in files:
    filename = []
    filename = Path(f)

    #EDIT: we stay in loop and process each file one by one with following lines:

    #Can not be null fields    
    df = pd.read_csv(f)
    emptyvals = []
    emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()
    
    #Bank Account Number is not 8 digits long
    accountnolen = []
    ac = []
    accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
    ac =  df[(df['AccNumLen'] != 8)]
    acd= ac.drop(['AccNumLen'],axis=1)
    
    #Create Exclusions
    allexclusions = []
    allexclusions = df[emptyvals].append(acd)
    allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)
    
    #GoodList
    #for f in files:
    #    filename = []
    #    filename = Path(f)
    origlist = df
    df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
    cl =  df[(df['_merge'] == 'left_only')]
    cld = cl.drop(['_merge','AccNumLen'],axis=1)
    cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)
    
    cld.to_csv(filename.stem+'good.csv',header =True,index=False)

In another words - you iterate over file names found in directory and THEN you take last "filename" and process it in one pass. By adding 4 spaces to rest of code we say to python interpreter that this part of code is part of loop and should be executed for each file. Hope it makes sense

to_csv multiple dataframes from loop with filename

Question

1 answers

solution1
0 ACCPTED 2020-07-31 13:36:36

to_csv multiple dataframes from loop with filename

Question

1 answers

solution1 0 ACCPTED 2020-07-31 13:36:36

solution1
0 ACCPTED 2020-07-31 13:36:36