简体   繁体   中英

Batch Appending matching rows to csv files using python

I have a set of csv files and another csv file, GroundTruth2010_edited_copy.csv, which contains information I'd like to append to the end of the rows of the Set of files. The files contain information describing geologic samples. For all the files, including GroundTruth2010_edited_copy.csv, each row has an identifying 'rockid' that identifies the sample and the remainder of the row describes various parameters of the sample. I want to append corresponding information from GroundTruth2010_edited_copy.csv to the Set of csv files. That is, if the rows have the same 'rockid,' I want to combine them into a new row in a new csv file. Hence, there is a new csv file for each original csv file in the Set. Here is my code.

import os
import csv
#read in ground truth data
csvfilename='GroundTruth/GroundTruth2010_edited_copy.csv'
with open(csvfilename) as csvfile:
    rocreader=csv.reader(csvfile)
    path=os.getcwd()
    filenames = os.listdir(path)
    for filename in filenames:
        if filename.endswith('.csv'):
            #read csv files                   
            r=csv.reader(open(filename))
            new_data = []
            for row in r:
               rockid=row[-1]

                for krow in rocreader:
                    entry=krow[0]
                    newentry=entry[:5] +entry[6:] #remove extra '0' from middle of entry 

                    if newentry==rockid:
                        print('Ok!')
                        #append ground truth data
                        new_data.append([row, krow[1], krow[2], krow[3], krow[4]]) 

            #write csv files          
            newfilename = "".join(filename.split(".csv")) + "_GT.csv"
            with open(newfilename, "w") as f:
                writer = csv.writer(f)
                writer.writerows(new_data) 

The code runs and makes my new csv files, but they are all empty. The problem seems to be that my second 'if' statement is never true: the console never prints 'Ok!' I've tried troubleshooting for a bit, and been rather frustrated. Perhaps the most frustrating thing is that after the program finishes, if I enter

   rockid==newentry

The console returns 'True,' so it seems to me I should get at least one 'Ok!' for the final iteration. Can anyone help me find what's wrong?

Also, since my if statement is never true, there may also be a problem with the way I append 'new_data.'

You only open rocreader once, so when you try to use it later in the loop, you'll only get rows from it the first time through-- in the rest of the loop's runs, you're reading 0 rows (and of course getting no matches). To read it over and over, open and close it once for each time you need to use it.

But instead of re-scanning the Ground Truth file from disk (slow!) for every row of each of the other CSVs, you should read it once into a dictionary, so you can look up IDs in one step.

with open(csvfilename) as csvfile:
    rocreader=csv.reader(csvfile)
    rocindex = dict((row[-1], row) for row in rocreader)

Then for any key newentry , you can just check like this:

if newentry in rocindex:
    truth = rocindex[newentry]  
    # Merge it with the row that has key `newentry`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM