[英]for loop is only reading the first line from a file
I have two files, the first file is a list of item with the items listed one per line.我有两个文件,第一个文件是项目列表,每行列出一个项目。 The second file is a tsv file with many items listed per line.
第二个文件是一个 tsv 文件,每行列出了许多项目。 So, some lines in the second file have items that might be listed in the first file.
因此,第二个文件中的某些行包含可能在第一个文件中列出的项目。 I need to generate a list of lines from the second file that might have items listed in the first file.
我需要从第二个文件中生成一个行列表,其中可能包含第一个文件中列出的项目。
grep -f is being finicky for me so I decided to make my own python script. grep -f 对我来说很挑剔,所以我决定制作自己的 python 脚本。 This is what I came up with:-
这就是我想出的:-
Big list is the second file, tiny list is the first file.大列表是第二个文件,小列表是第一个文件。
def main():
desired_subset = []
small_list = open('tiny_list.txt','r')
big_list = open('big_list.tsv','r')
for i in small_list.readlines():
i = i.rstrip('\n')
for big_line in big_list:
if i in big_line:
if i not in desired_subset:
desired_subset.append(big_line)
print(desired_subset)
print(len(desired_subset))
main()
The problem is that the for loop is only reading through the first line.问题是 for 循环只读取第一行。 Any suggestions?
有什么建议么?
When you iterate over file (here over big_list
) you "consume it, so that on the second iteration of small_list
you don't have anything left in big_list
. Try reading big_list
with .readlines()
into the list variable before the main for
loop and use that:当您遍历文件时(此处通过
big_list
),您“使用它,因此在small_list
的第二次迭代中您没有任何东西留在big_list
中。尝试在主for
循环之前使用.readlines()
将big_list
读入 list 变量并使用它:
def main():
desired_subset = []
small_list = open('tiny_list.txt','r')
big_list = open('big_list.tsv','r').readlines() # note here
for i in small_list.readlines():
i = i.rstrip('\n')
for big_line in big_list:
if i in big_line:
if i not in desired_subset:
desired_subset.append(big_line)
print(desired_subset)
print(len(desired_subset))
Also, you don't close your files which is a bad practice.此外,您不要关闭文件,这是一种不好的做法。 I'd suggest to use context manager (open files with
with
statement):我建议使用上下文管理器(使用
with
语句打开文件):
def main():
desired_subset = []
with open('tiny_list.txt','r') as small_list,
open('big_list.tsv','r') as big_list:
small_file_lines = small_list.readlines()
big_file_lines = big_list.readlines()
for i in small_file_lines:
i = i.rstrip('\n')
for big_line in big_file_lines:
if i in big_line:
if i not in desired_subset:
desired_subset.append(big_line)
print(desired_subset)
print(len(desired_subset))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.