简体   繁体   中英

searching for strings in all lines of a text file: Python

having a problem here so hopefully could use some help.

I have a text file with an ID number and a set of "descriptors" on each line. The descriptors may or may not be unique to each line (they can be used multiple times throughout the document).

I basically want to identify all the ID numbers that contain a certain descriptor... my code is working but it only finds the first occurrence of the descriptor, instead of all of them. Any quick fix?

All the descriptors are in a list already. Example of the text file:

ID_45555 (tab) some irrelevant data (tab) **DESCRIPTOR1** DESCRIPTOR2 DESCRIPTOR3

ID_55555 (tab) some irrelevant data (tab) DESCRIPTOR200 **DESCRIPTOR1** DESCRIPTOR599

Code:

for line in file:
    line = line.strip()
    line = line.split("\t")
    IDNUMBER = line[0]
    DESCRIPTOR = line[2]
    for x in total_list:
        if x in DESCRIPTOR:
            print x, DESCRIPTOR

I'd suggest using a dict for this, with the descriptors as the keys and the corresponding IDs as the values. You go through the file and at each line, add the ID to the list filed in the dictionary under each descriptor. For example:

by_descriptors = collections.defaultdict(list)
for line in file:
    id, _, descriptors = line.strip().split("\t")
    for d in descriptors.split():
        by_descriptors[d].append(id)
# to find all IDs for a given descriptor:
by_descriptors.get(id, [])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM