简体   繁体   English

for 循环仅从文件中读取第一行

[英]for loop is only reading the first line from a file

I have two files, the first file is a list of item with the items listed one per line.我有两个文件,第一个文件是项目列表,每行列出一个项目。 The second file is a tsv file with many items listed per line.第二个文件是一个 tsv 文件,每行列出了许多项目。 So, some lines in the second file have items that might be listed in the first file.因此,第二个文件中的某些行包含可能在第一个文件中列出的项目。 I need to generate a list of lines from the second file that might have items listed in the first file.我需要从第二个文件中生成一个行列表,其中可能包含第一个文件中列出的项目。

grep -f is being finicky for me so I decided to make my own python script. grep -f 对我来说很挑剔,所以我决定制作自己的 python 脚本。 This is what I came up with:-这就是我想出的:-

Big list is the second file, tiny list is the first file.大列表是第二个文件,小列表是第一个文件。

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r')
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

 
main()

 

The problem is that the for loop is only reading through the first line.问题是 for 循环只读取第一行。 Any suggestions?有什么建议么?

When you iterate over file (here over big_list ) you "consume it, so that on the second iteration of small_list you don't have anything left in big_list . Try reading big_list with .readlines() into the list variable before the main for loop and use that:当您遍历文件时(此处通过big_list ),您“使用它,因此在small_list的第二次迭代中您没有任何东西留在big_list中。尝试在主for循环之前使用.readlines()big_list读入 list 变量并使用它:

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r').readlines() # note here
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

Also, you don't close your files which is a bad practice.此外,您不要关闭文件,这是一种不好的做法。 I'd suggest to use context manager (open files with with statement):我建议使用上下文管理器(使用with语句打开文件):

def main():
    desired_subset = []
    with open('tiny_list.txt','r') as small_list,
         open('big_list.tsv','r') as big_list:

         small_file_lines = small_list.readlines()
         big_file_lines = big_list.readlines()

    for i in small_file_lines:
        i = i.rstrip('\n')
        for big_line in big_file_lines:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)

    print(desired_subset)
    print(len(desired_subset))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM