简体   繁体   English

读取两个文件并根据第一个文件的列过滤第二个文件

[英]read two files and filter second file based on a column of first file

I have an input file containing the keywords and have csv file that needs to be filtered on those keywords. 我有一个包含关键字的输入文件,并且有需要根据这些关键字过滤的csv文件。

Here is my attempt to automate task using python. 这是我尝试使用python自动执行任务。

import csv
with open('Input.txt', 'rb') as InputFile:
    with open('28JUL2017.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            #print row1

            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                #print row2
                if row1[0] == row2[0] :

                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

I get just the first row from the file to be filtered. 我只是从要过滤的文件的第一行。 Tried various things but could not understand where I am going wrong. 尝试了各种事情,但无法理解我要去哪里。 Please point the mistake for me here. 请在这里为我指出错误。

read_Input and read_CM are essentially iterators. read_Inputread_CM本质上是迭代器。 Once you loop over them - you are done: you cannot iterate twice. 一旦遍历它们-您就完成了:您不能重复两次。 If you insist on doing your way, then you have to rewind to the beginning of the file each time you want to start a new loop and "re-read" the CSV file. 如果您坚持要这样做,那么每次您要开始新的循环并“重新读取”​​ CSV文件时,都必须倒回到文件的开头。 Here is a fix: 解决方法:

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            CM_File.seek(0) # rewind to the beginning of the file
            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                if row1[0] == row2[0] :
                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

Instead of this, I would suggest that you loop over already read lines instead of re-reading files. 取而代之的是,我建议您遍历已读取的行,而不是重新读取文件。 Also, instead of having nested loops, create a list of "keywords" and simply check that row2[0] is in that list: 另外,创建嵌套的“关键字”列表,而不是嵌套循环,只需检查row2[0]是否在该列表中即可:

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input = csv.reader(InputFile) # read file only once
        keywords = [rec[0] for rec in read_Input]
        read_CM = csv.reader(CM_File) # read file only once
        next(read_CM, None) # not sure why you do this? to skip first line?
        for row2 in read_CM:
            if row2[0] in keywords:
                Output = row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                print("Output: {}".format(Output))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取 xlsx 文件的第一列创建 Txt 文件而不是将第二列数据放入相应的 txt 文件 - Read First column of xlsx file create Txt files than Put second column data to corresonding txt file 在Python中,如何根据一列中的值比较两个csv文件并从第一个文件中输出与第二个不匹配的记录 - In Python, how to compare two csv files based on values in one column and output records from first file that do not match second 读取文本文件,并在第一列中将其分为多个基于唯一代码的文件 - Read the text file and split into multiple files based unique code present in the first column 遍历两个文件以创建一个新文件,该文件具有第二个文件的字段附加到第一个文件的字段 - Iterate through two files to create a new file that has fields from second file appended to fields of first file 使用 Python 根据第一列将 xlsx 文件拆分为其他 xlsx 文件 - Split xlsx file into other xlsx files based on first column with Python 使用 Python 中的第一列值将 txt 文件拆分为两个文件 - Split a txt file into two files using first column value in Python 比较两个文件并在Python中更新第二个文件中第一个文件的值的最佳方法是什么? - What's the best way to compare two files & update the values of the first file from second file in Python? 比较两个文本文件,替换第一个文件中包含第二个文件中的字符串的行 - Compare two text files, replace lines in first file that contain a string from lines in second file 从具有从第一列派生的关系的双列文件的第二列构建边列表 - Constructing an edge list from the second column of a two columnar file with relationships derived from the first column 通过python中的第一个(或第二个,或其他)列对文件进行排序 - Sort a file by first (or second, or else) column in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM