简体   繁体   English

使用python将匹配的行追加到csv文件中

[英]Batch Appending matching rows to csv files using python

I have a set of csv files and another csv file, GroundTruth2010_edited_copy.csv, which contains information I'd like to append to the end of the rows of the Set of files. 我有一组csv文件和另一个csv文件GroundTruth2010_edited_copy.csv,其中包含我要附加到文件集行末尾的信息。 The files contain information describing geologic samples. 这些文件包含描述地质样本的信息。 For all the files, including GroundTruth2010_edited_copy.csv, each row has an identifying 'rockid' that identifies the sample and the remainder of the row describes various parameters of the sample. 对于包括GroundTruth2010_edited_copy.csv在内的所有文件,每行都有一个标识“ rockid”,用于标识样本,其余各行描述了样本的各种参数。 I want to append corresponding information from GroundTruth2010_edited_copy.csv to the Set of csv files. 我想将GroundTruth2010_edited_copy.csv中的相应信息附加到csv文件集中。 That is, if the rows have the same 'rockid,' I want to combine them into a new row in a new csv file. 也就是说,如果行具有相同的“ rockid”,我想将它们合并到新的csv文件中的新行中。 Hence, there is a new csv file for each original csv file in the Set. 因此,集合中的每个原始csv文件都有一个新的csv文件。 Here is my code. 这是我的代码。

import os
import csv
#read in ground truth data
csvfilename='GroundTruth/GroundTruth2010_edited_copy.csv'
with open(csvfilename) as csvfile:
    rocreader=csv.reader(csvfile)
    path=os.getcwd()
    filenames = os.listdir(path)
    for filename in filenames:
        if filename.endswith('.csv'):
            #read csv files                   
            r=csv.reader(open(filename))
            new_data = []
            for row in r:
               rockid=row[-1]

                for krow in rocreader:
                    entry=krow[0]
                    newentry=entry[:5] +entry[6:] #remove extra '0' from middle of entry 

                    if newentry==rockid:
                        print('Ok!')
                        #append ground truth data
                        new_data.append([row, krow[1], krow[2], krow[3], krow[4]]) 

            #write csv files          
            newfilename = "".join(filename.split(".csv")) + "_GT.csv"
            with open(newfilename, "w") as f:
                writer = csv.writer(f)
                writer.writerows(new_data) 

The code runs and makes my new csv files, but they are all empty. 该代码运行并制作了我的新csv文件,但是它们都为空。 The problem seems to be that my second 'if' statement is never true: the console never prints 'Ok!' 问题似乎是我的第二个“ if”语句从不正确:控制台从不打印“ Ok!”。 I've tried troubleshooting for a bit, and been rather frustrated. 我尝试了一些故障排除,但感到非常沮丧。 Perhaps the most frustrating thing is that after the program finishes, if I enter 也许最令人沮丧的是,在程序完成后,如果我输入

   rockid==newentry

The console returns 'True,' so it seems to me I should get at least one 'Ok!' 控制台返回“ True”,所以在我看来我应该至少得到一个“ Ok!”。 for the final iteration. 最后的迭代。 Can anyone help me find what's wrong? 谁能帮我找出问题所在?

Also, since my if statement is never true, there may also be a problem with the way I append 'new_data.' 另外,由于我的if语句永远不会为真,因此附加“ new_data”的方式也可能存在问题。

You only open rocreader once, so when you try to use it later in the loop, you'll only get rows from it the first time through-- in the rest of the loop's runs, you're reading 0 rows (and of course getting no matches). 您只打开rocreader一次,因此,当您在循环中稍后尝试使用它时,您只会在第一次使用时从中读取行-在循环的其余运行中,您正在读取0行(当然没有匹配项)。 To read it over and over, open and close it once for each time you need to use it. 要反复阅读,请在每次需要使用时打开和关闭一次。

But instead of re-scanning the Ground Truth file from disk (slow!) for every row of each of the other CSVs, you should read it once into a dictionary, so you can look up IDs in one step. 但是,与其从磁盘上重新扫描地面真相文件(慢!),对于其他CSV的每一行,您都应该将其读入字典一次,这样您就可以一步查找ID。

with open(csvfilename) as csvfile:
    rocreader=csv.reader(csvfile)
    rocindex = dict((row[-1], row) for row in rocreader)

Then for any key newentry , you can just check like this: 然后,对于任何关键的newentry ,您都可以像这样检查:

if newentry in rocindex:
    truth = rocindex[newentry]  
    # Merge it with the row that has key `newentry`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM