for 循环仅从文件中读取第一行

Question

I have two files, the first file is a list of item with the items listed one per line.我有两个文件，第一个文件是项目列表，每行列出一个项目。 The second file is a tsv file with many items listed per line.第二个文件是一个 tsv 文件，每行列出了许多项目。 So, some lines in the second file have items that might be listed in the first file.因此，第二个文件中的某些行包含可能在第一个文件中列出的项目。 I need to generate a list of lines from the second file that might have items listed in the first file.我需要从第二个文件中生成一个行列表，其中可能包含第一个文件中列出的项目。

grep -f is being finicky for me so I decided to make my own python script. grep -f 对我来说很挑剔，所以我决定制作自己的 python 脚本。 This is what I came up with:-这就是我想出的：-

Big list is the second file, tiny list is the first file.大列表是第二个文件，小列表是第一个文件。

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r')
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

 
main()

The problem is that the for loop is only reading through the first line.问题是 for 循环只读取第一行。 Any suggestions?有什么建议么？

Answer 1

When you iterate over file (here over big_list ) you "consume it, so that on the second iteration of small_list you don't have anything left in big_list . Try reading big_list with .readlines() into the list variable before the main for loop and use that:当您遍历文件时（此处通过big_list ），您“使用它，因此在small_list的第二次迭代中您没有任何东西留在big_list中。尝试在主for循环之前使用.readlines()将big_list读入 list 变量并使用它：

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r').readlines() # note here
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

Also, you don't close your files which is a bad practice.此外，您不要关闭文件，这是一种不好的做法。 I'd suggest to use context manager (open files with with statement):我建议使用上下文管理器（使用with语句打开文件）：

def main():
    desired_subset = []
    with open('tiny_list.txt','r') as small_list,
         open('big_list.tsv','r') as big_list:

         small_file_lines = small_list.readlines()
         big_file_lines = big_list.readlines()

    for i in small_file_lines:
        i = i.rstrip('\n')
        for big_line in big_file_lines:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)

    print(desired_subset)
    print(len(desired_subset))

for 循环仅从文件中读取第一行

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-29 19:07:24

for 循环仅从文件中读取第一行

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-29 19:07:24

解决方案1
0 已采纳 2021-04-29 19:07:24