for loop is only reading the first line from a file

Question

I have two files, the first file is a list of item with the items listed one per line. The second file is a tsv file with many items listed per line. So, some lines in the second file have items that might be listed in the first file. I need to generate a list of lines from the second file that might have items listed in the first file.

grep -f is being finicky for me so I decided to make my own python script. This is what I came up with:-

Big list is the second file, tiny list is the first file.

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r')
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

 
main()

The problem is that the for loop is only reading through the first line. Any suggestions?

Answer 1

When you iterate over file (here over big_list ) you "consume it, so that on the second iteration of small_list you don't have anything left in big_list . Try reading big_list with .readlines() into the list variable before the main for loop and use that:

def main():
    desired_subset = []
    small_list = open('tiny_list.txt','r')
    big_list = open('big_list.tsv','r').readlines() # note here
    for i in small_list.readlines():
        i = i.rstrip('\n')
        for big_line in big_list:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)
    print(desired_subset)
    print(len(desired_subset))

Also, you don't close your files which is a bad practice. I'd suggest to use context manager (open files with with statement):

def main():
    desired_subset = []
    with open('tiny_list.txt','r') as small_list,
         open('big_list.tsv','r') as big_list:

         small_file_lines = small_list.readlines()
         big_file_lines = big_list.readlines()

    for i in small_file_lines:
        i = i.rstrip('\n')
        for big_line in big_file_lines:
            if i in big_line:
                if i not in desired_subset:
                    desired_subset.append(big_line)

    print(desired_subset)
    print(len(desired_subset))

for loop is only reading the first line from a file

Question

1 answers

solution1
0 ACCPTED 2021-04-29 19:07:24

for loop is only reading the first line from a file

Question

1 answers

solution1 0 ACCPTED 2021-04-29 19:07:24

solution1
0 ACCPTED 2021-04-29 19:07:24