简体   繁体   中英

to_csv multiple dataframes from loop with filename

I'm trying to create multiple good/bad files from original .csv files from a directory.

Im fairly new to Python, but have cobbled together the below, but it's not saving multiple files, just x1 "good" and x1 "bad" file. in the dir i have testfile1 and testfile2 . the output should be testfile1good testfile1bad testfile2good testfile2bad .

Any help would be greatly appreciated.

Thanks

import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path


files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')


for f in files:
    filename = []
    filename = Path(f)

#Can not be null fields    
df = pd.read_csv(f)
emptyvals = []
emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()

#Bank Account Number is not 8 digits long
accountnolen = []
ac = []
accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
ac =  df[(df['AccNumLen'] != 8)]
acd= ac.drop(['AccNumLen'],axis=1)

#Create Exclusions
allexclusions = []
allexclusions = df[emptyvals].append(acd)
allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)

#GoodList
#for f in files:
#    filename = []
#    filename = Path(f)
origlist = df
df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
cl =  df[(df['_merge'] == 'left_only')]
cld = cl.drop(['_merge','AccNumLen'],axis=1)
cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)

cld.to_csv(filename.stem+'good.csv',header =True,index=False)

i think you do loop but leave it and do the rest on line 14 - there you have filename set and you save your data once.

What you want is do the loop and the rest should happen for each iteration, so code should look like this:

import pandas as pd
from string import ascii_letters
import glob
from pathlib import Path


files = glob.glob('C:\\Users\\nickn\\OneDrive\\Documents\\Well\\*.csv')


for f in files:
    filename = []
    filename = Path(f)

    #EDIT: we stay in loop and process each file one by one with following lines:

    #Can not be null fields    
    df = pd.read_csv(f)
    emptyvals = []
    emptyvals = df['First Name'].isnull() | df['Last Name'].isnull()
    
    #Bank Account Number is not 8 digits long
    accountnolen = []
    ac = []
    accountnolen = df['AccNumLen'] = df['Bank Account Number'].astype(str).map(len)
    ac =  df[(df['AccNumLen'] != 8)]
    acd= ac.drop(['AccNumLen'],axis=1)
    
    #Create Exclusions
    allexclusions = []
    allexclusions = df[emptyvals].append(acd)
    allexclusions.to_csv(filename.stem+"bad.csv",header =True,index=False)
    
    #GoodList
    #for f in files:
    #    filename = []
    #    filename = Path(f)
    origlist = df
    df = pd.merge(origlist, allexclusions, how='outer', indicator=True)
    cl =  df[(df['_merge'] == 'left_only')]
    cld = cl.drop(['_merge','AccNumLen'],axis=1)
    cld['Well ID'] = cld['Well ID'].str.rstrip(ascii_letters)
    
    cld.to_csv(filename.stem+'good.csv',header =True,index=False)

In another words - you iterate over file names found in directory and THEN you take last "filename" and process it in one pass. By adding 4 spaces to rest of code we say to python interpreter that this part of code is part of loop and should be executed for each file. Hope it makes sense

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM