简体   繁体   English

在文本文件的所有行中搜索字符串:Python

[英]searching for strings in all lines of a text file: Python

having a problem here so hopefully could use some help. 在这里遇到问题,希望可以有所帮助。

I have a text file with an ID number and a set of "descriptors" on each line. 我有一个带有ID号的文本文件,并且每行上都有一组“描述符”。 The descriptors may or may not be unique to each line (they can be used multiple times throughout the document). 描述符对于每一行可能是唯一的,也可能不是唯一的(它们可以在整个文档中多次使用)。

I basically want to identify all the ID numbers that contain a certain descriptor... my code is working but it only finds the first occurrence of the descriptor, instead of all of them. 我基本上想识别出包含特定描述符的所有ID号...我的代码正在工作,但它只会找到描述符的第一个匹配项,而不是所有的ID。 Any quick fix? 任何快速解决方案?

All the descriptors are in a list already. 所有描述符都已经在列表中。 Example of the text file: 文本文件示例:

ID_45555 (tab) some irrelevant data (tab) **DESCRIPTOR1** DESCRIPTOR2 DESCRIPTOR3

ID_55555 (tab) some irrelevant data (tab) DESCRIPTOR200 **DESCRIPTOR1** DESCRIPTOR599

Code: 码:

for line in file:
    line = line.strip()
    line = line.split("\t")
    IDNUMBER = line[0]
    DESCRIPTOR = line[2]
    for x in total_list:
        if x in DESCRIPTOR:
            print x, DESCRIPTOR

I'd suggest using a dict for this, with the descriptors as the keys and the corresponding IDs as the values. 我建议为此使用dict,将描述符作为键,并将相应的ID作为值。 You go through the file and at each line, add the ID to the list filed in the dictionary under each descriptor. 您遍历文件,并在每一行中将ID添加到字典中每个描述符下的列表中。 For example: 例如:

by_descriptors = collections.defaultdict(list)
for line in file:
    id, _, descriptors = line.strip().split("\t")
    for d in descriptors.split():
        by_descriptors[d].append(id)
# to find all IDs for a given descriptor:
by_descriptors.get(id, [])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM