简体   繁体   中英

How to read one file at a time from folder that contains multiple csv files in python

I have a folder with hundreds of CSV files. I need to check every file if there is a column present(Checking wine column) or not. If a column is present, then save it in a folder without doing anything, if the column is not present, then add column and save it in a folder.

The problem is that python code is reading multiple CSV files instead of one CSV file. I can't write a python pandas logic that reads one file only, checks the column is present or not and saves in a folder.

Input File1

A   B   C   D   E   F   Distance    G   H   I   J   L   K
0   05:58.0 0   2869421 1400.862536 0   0   0.777879166 0   1   7   5   test

Input File2

A   B   C   D   E   F   Distance    wine    H   I   J   L   K
0   1/12/2021 4:05  0   2869421 15000   0   50  0.777879166 0   1   7   5   test2

As you see input file 1 wine column is not present, so I need to do some operations while in other input file-2 wine column is present, so I don't do any any operation.

This is my code so far to loop through the files in the folder. However this loops through all the files:

def main(path_csv,path_save, verbose):
    if (".csv" in str(path_csv).lower()) and path_csv.is_file():
        csv_files = [Path(path_csv)]
    else:
        csv_files = list(Path(path_csv).glob("*.csv"))
    
    all_dfs_1 = pd.DataFrame()
            
    for fn in csv_files:
        all_dfs_1 = pd.read_csv(fn,header=0)
        #print(all_dfs_1)
        if 'wine' not in all_dfs_1.columns:
            all_dfs_1.insert(all_dfs_1.columns.get_loc('Distance')+1,'wine','0')
        all_dfs_1 = pd.DataFrame(all_dfs_1)
    x = os.path.splitext(fn.name)[0]    
    all_dfs_1.to_csv(os.path.join(path_save,f"{x}.csv"),index=False)   

How do I only loop through one file at a time?

Attach Python code and file here

IIUC change indentation of last 2 rows and remove all_dfs_1 = pd.DataFrame() and all_dfs_1 = pd.DataFrame(all_dfs_1) :

def main(path_csv,path_save, verbose):
    if (".csv" in str(path_csv).lower()) and path_csv.is_file():
        csv_files = [Path(path_csv)]
    else:
        csv_files = list(Path(path_csv).glob("*.csv"))
        
     for fn in csv_files:
            all_dfs_1 = pd.read_csv(fn,header=0)
            #print(all_dfs_1)
            if 'wine' not in all_dfs_1.columns:
                all_dfs_1.insert(all_dfs_1.columns.get_loc('Distance')+1,'wine','0')
            x = os.path.splitext(fn.name)[0]    
            all_dfs_1.to_csv(os.path.join(path_save,f"{x}.csv"),index=False)   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM