简体   繁体   中英

Within a loop print each element from a list on one line per loop

I have a file like this:

1:200-320    ['gene_id "xyz";transcript_id "xyzt"; exon_number "1"\n', 'gene_id "xyz";transcript_id "xyzt2"; exon_number "2"\n']
1:3000-3200    ['gene_id "xyz";transcript_id "xy"; exon_number "2"\n']

Extremely messy, I am looking to tidy it up by firstly grouping terms. ie Pull out the transcript_ids and have the written as transcript_id xyzt, xyzt2; And eventually repeat for all the terms there.

My approach to this was to first remove all the messy characters using replace

out=open('foo.txt','w')
with open('in.txt', 'r') as f:
    for line in f:
        tidyline = line.replace('[', "").strip()
        tidyline = tidyline.replace(']', "").strip()
        tidyline = tidyline.replace('"', "").strip()
        tidyline = tidyline.replace("'", "").strip()
        tidyline = tidyline.replace(",", "").strip()
        out.write("%s\n" %tidyline)

Then using re to try and match the strings and pull back this info. Which I can do, just not sure how to write to a file to keep them on the appropriate lines.

import re

with open('foo.txt', 'r') as f:
    for line in f:
        result = re.findall('transcript_id\s(\w+)',line)    
        print result
['xyzt', 'xyzt2']
['xy']

My idea was to do something like:

string= "transcript_id %s,%s" %(results[0], results[1])
file.write("%s\n" %string)

but because all the list for each line are different lengths that doesn't work.

The last of your problems (writing the lists of variable lengths) can be solved using the join method of string. Try this:

s = "transcript_id " + ",".join(results)

To be on the save side concerning your file operations you should move the opening of the out-file to the with-statement, to avoid retaining unclosed files:

with open('in.txt', 'r') as f, open('foo.txt','w') as out:
    ...

Do you really need the in between step of writing the foo.txt or is this just a workaournd?

I hope this helps.

You can put all results in one list and then go through it:

transcript_id_list = []
with open('foo.txt', 'r') as f:
    for line in f:
        result = re.findall('transcript_id.*?(\w+)',line)
        if result:
            transcript_id_list.extend(result)

for item in transcript_id_list:
    string= "transcript_id %s" % item
    file.write("%s\n" % string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM