[英]read two files and filter second file based on a column of first file
I have an input file containing the keywords and have csv file that needs to be filtered on those keywords. 我有一个包含关键字的输入文件,并且有需要根据这些关键字过滤的csv文件。
Here is my attempt to automate task using python. 这是我尝试使用python自动执行任务。
import csv
with open('Input.txt', 'rb') as InputFile:
with open('28JUL2017.csv', 'rb') as CM_File:
read_Input=csv.reader(InputFile)
for row1 in csv.reader(InputFile):
#print row1
read_CM=csv.reader(CM_File)
next(read_CM, None)
for row2 in csv.reader(CM_File):
#print row2
if row1[0] == row2[0] :
Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
print Output
I get just the first row from the file to be filtered. 我只是从要过滤的文件的第一行。 Tried various things but could not understand where I am going wrong. 尝试了各种事情,但无法理解我要去哪里。 Please point the mistake for me here. 请在这里为我指出错误。
read_Input
and read_CM
are essentially iterators. read_Input
和read_CM
本质上是迭代器。 Once you loop over them - you are done: you cannot iterate twice. 一旦遍历它们-您就完成了:您不能重复两次。 If you insist on doing your way, then you have to rewind to the beginning of the file each time you want to start a new loop and "re-read" the CSV file. 如果您坚持要这样做,那么每次您要开始新的循环并“重新读取” CSV文件时,都必须倒回到文件的开头。 Here is a fix: 解决方法:
import csv
with open('file1.csv', 'rb') as InputFile:
with open('file2.csv', 'rb') as CM_File:
read_Input=csv.reader(InputFile)
for row1 in csv.reader(InputFile):
CM_File.seek(0) # rewind to the beginning of the file
read_CM=csv.reader(CM_File)
next(read_CM, None)
for row2 in csv.reader(CM_File):
if row1[0] == row2[0] :
Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
print Output
Instead of this, I would suggest that you loop over already read lines instead of re-reading files. 取而代之的是,我建议您遍历已读取的行,而不是重新读取文件。 Also, instead of having nested loops, create a list of "keywords" and simply check that row2[0]
is in that list: 另外,创建嵌套的“关键字”列表,而不是嵌套循环,只需检查row2[0]
是否在该列表中即可:
import csv
with open('file1.csv', 'rb') as InputFile:
with open('file2.csv', 'rb') as CM_File:
read_Input = csv.reader(InputFile) # read file only once
keywords = [rec[0] for rec in read_Input]
read_CM = csv.reader(CM_File) # read file only once
next(read_CM, None) # not sure why you do this? to skip first line?
for row2 in read_CM:
if row2[0] in keywords:
Output = row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
print("Output: {}".format(Output))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.