读取两个文件并根据第一个文件的列过滤第二个文件

Question

I have an input file containing the keywords and have csv file that needs to be filtered on those keywords. 我有一个包含关键字的输入文件，并且有需要根据这些关键字过滤的csv文件。

Here is my attempt to automate task using python. 这是我尝试使用python自动执行任务。

import csv
with open('Input.txt', 'rb') as InputFile:
    with open('28JUL2017.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            #print row1

            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                #print row2
                if row1[0] == row2[0] :

                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

I get just the first row from the file to be filtered. 我只是从要过滤的文件的第一行。 Tried various things but could not understand where I am going wrong. 尝试了各种事情，但无法理解我要去哪里。 Please point the mistake for me here. 请在这里为我指出错误。

Answer 1

read_Input and read_CM are essentially iterators. read_Input和read_CM本质上是迭代器。 Once you loop over them - you are done: you cannot iterate twice. 一旦遍历它们-您就完成了：您不能重复两次。 If you insist on doing your way, then you have to rewind to the beginning of the file each time you want to start a new loop and "re-read" the CSV file. 如果您坚持要这样做，那么每次您要开始新的循环并“重新读取” CSV文件时，都必须倒回到文件的开头。 Here is a fix: 解决方法：

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            CM_File.seek(0) # rewind to the beginning of the file
            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                if row1[0] == row2[0] :
                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

Instead of this, I would suggest that you loop over already read lines instead of re-reading files. 取而代之的是，我建议您遍历已读取的行，而不是重新读取文件。 Also, instead of having nested loops, create a list of "keywords" and simply check that row2[0] is in that list: 另外，创建嵌套的“关键字”列表，而不是嵌套循环，只需检查row2[0]是否在该列表中即可：

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input = csv.reader(InputFile) # read file only once
        keywords = [rec[0] for rec in read_Input]
        read_CM = csv.reader(CM_File) # read file only once
        next(read_CM, None) # not sure why you do this? to skip first line?
        for row2 in read_CM:
            if row2[0] in keywords:
                Output = row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                print("Output: {}".format(Output))

读取两个文件并根据第一个文件的列过滤第二个文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-07-31 18:34:35

读取两个文件并根据第一个文件的列过滤第二个文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-07-31 18:34:35

解决方案1
1 已采纳 2017-07-31 18:34:35