简体   繁体   中英

Python - Print horizontally two strings, with |

I'm having a small formatting issue that I can't seem to solve. I have some long strings, in the form of DNA sequences. I added each to a separate list, with the letters each an individual item in either list. They are of unequal length, so I appended "N's" to the shorter of the two.

Ex:

seq1 = ['A', 'T', 'G', 'G', 'A', 'C', 'G', 'C', 'A'] seq2 = ['A', 'T', 'G', 'G', 'C', 'T', 'G']

seq2 became: ['A', 'T', 'G', 'G', 'C', 'T', 'G', 'N', 'N']

Currently, after comparing the letter in each list I get:

ATGG--G--

where '-' is a mismatch in the letters (includings "N's").

Ideally what I would like to print is:

  seq1  ATGGACGCA
        |||||||||
  seq2  ATGG--G--

I've been playing around with new line characters commas at the end of print statements, however I can't get it to work. I would like to print an identifier for each one on the same line as it's sequence.

Here's the function used to compare the two seqs:

def align_seqs(orf, query):
        orf_base = list(orf)
        query_base = list(query)

        if len(query_base) > len(orf_base):
                N = (len(query_base) - len(orf_base))
                for i in range(N):
                        orf_base.append("N")
        elif len(query_base) < len(orf_base):
                N = (len(orf_base) - len(query_base))
                for i in range(N):
                        query_base.append("N")
        align = []

        for i in range(0, len(orf_base)):
                if orf_base[i] == query_base[i]:
                        align.append(orf_base[i])
                else:
                        align.append("-")

        print ''.join(align)

At the present time, I'm just printing the "bottom" portion of what I want to print.

All help is appreciated.

If I understand correctly, this is a formatting question. I recommend looking at str.format() . Assuming you can get your sequences to strings (as you did with seq2 as align). Try:

seq1 = 'ATGGACGCA'
seq2 = 'ATGG--G--'

print(' seq1: {}\n       {}\n seq2: {}'.format(seq1, len(seq1)*'|', seq2))

A little hacky, but gets the job done. The arguments of format() replace the {}'s in order in the given string. I get:

 seq1: ATGGACGCA
       |||||||||
 seq2: ATGG--G--

You could always try something simple like the following which does not assume the same size but you can adjust it as you see fit.

def printSequences(seq1, seq2):
    print('seq1',seq1)
    print('    ','|'*max(len(seq1),len(seq2)))
    print('seq2',seq2)

So, here's a solution for you that works with long strings:

s1 = 'ATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGGATAAGG'
s2 = 'A-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-AAGG'

#assumes both sequences are of same length (post-alignment)
def print_align(seq1, seq2, length):
    while len(seq1) > 0:
        print "seq1: " + seq1[:length-6]
        print "      " + '|'*len(seq1[:length-6])
        print "seq2: " + seq2[:length-6] + "\n"
        seq1 = seq1[length-6:]
        seq2 = seq2[length-6:]

print_align(s1, s2, 30)

The output is:

seq1: ATAAGGATAAGGATAAGGATAAGG
      ||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGG

seq1: ATAAGGATAAGGATAAGGATAAGG
      ||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGG

seq1: ATAAGGATAAGG
      ||||||||||||
seq2: A-AAGGA-AAGG

Which I believe is what you want. You can play around with the length parameter in order to get the lines to display properly (each line is cut off after reaching the length specified by that parameter). For example, if I call print_align(s1, s2, 39) I get:

seq1: ATAAGGATAAGGATAAGGATAAGGATAAGGATA
      |||||||||||||||||||||||||||||||||
seq2: A-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA-A

seq1: AGGATAAGGATAAGGATAAGGATAAGG
      |||||||||||||||||||||||||||
seq2: AGGA-AAGGA-AAGGA-AAGGA-AAGG

This will have a much more reasonable result when you try it with huge (>1000bp) sequences.

Note that the function takes two sequences of the same length as input, so this is just to print it nicely after you've done all the hard aligning work.

PS Generally in sequence alignment one only displays the bar | for matching nucleotides. The solution is pretty easy and you should be able to figure it out (if you have throuble though let me know).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM