简体   繁体   中英

traceback in global sequence alignment

I am facing problem of tracing back the global sequence alignment. My first sequence is ATTGCGCGCAT and second sequence is ATGCTTAACCA. The traceback result should be ATTGC _ _ _ GCGCATA _ TGCTTAAC _ CA _ But the code i am using unable to output that result.

I had tried to browse the google for reference but there are few reference of traceback written in Java.

private static void traceback(String seqOne, String seqTwo,int[][]matrix) {


    StringBuilder s1 = new StringBuilder(), s2 = new StringBuilder();

    for ( int i = seqOne.length(),  j = seqTwo.length(); i > 0 && j > 0; ) {
        if (i > 0 && j > 0 && (matrix[i][j] == matrix[i - 1][j - 1]) ) {
            s1.append(seqOne.charAt(--i));
            s2.append(seqTwo.charAt(--j));
        } else if ( i > 0 && (matrix[i][j] == matrix[i - 1][j] + 1) ) {
            s1.append(seqOne.charAt(--i));
            s2.append("-");
        } else if ( j > 0 && (matrix[i][j] == matrix[i][j - 1] + 1) ) {
            s2.append(seqTwo.charAt(--j));
            s1.append("-");
        }
    }

    System.out.println();
    System.out.println(s1.reverse().toString());
    System.out.println(s2.reverse().toString());

}

This is the answer i used the online stimulator to demo

追溯

The below example is from Paul A. Gagniuc. Algorithms in Bioinformatics: Theory and Implementation. John Wiley & Sons, Hoboken, NJ, USA, 2021, ISBN: 9781119697961. , which shows the correct implementation of pairwise sequence alignment. Let's consider the two DNA sequences given as an example by you:

 // Variable statement var Match = +2; var Mismatch = -1; var gap = -2; var s0 = 'ATTGCGCGCAT'; var s1 = 'ATGCTTAACCA'; var AlignmentA = ""; var AlignmentM = ""; var AlignmentB = ""; var e = '&emsp;'; var m = []; var s = []; var MMax = 0; var MMin = 0; var x = 0; var y = 0; // Matrix initialization and completion s[0] = [] = s0.split(''); s[1] = [] = s1.split(''); var n_0 = s[0].length + 1; var n_1 = s[1].length + 1; for(var i=0; i<=n_0; i++) { m[i]=[]; for(var j=0; j<=n_1; j++) { m[i][j]=0; if (i==1 && j>1) {m[i][j]=m[i][j-1]+gap;} if (j==1 && i>1) {m[i][j]=m[i-1][j]+gap;} if (i>1) {m[i][0]=s[0][i-2];} if (j>1) {m[0][j]=s[1][j-2];} if(i>1 && j>1){ var A = m[i-1][j-1] + f(m[i][0],m[0][j]); //'\\ var B = m[i-1][j] + gap; //'- var C = m[i][j-1] + gap; //'| var D = 0; m[i][j] = Math.max(A, B, C, D); if(m[i][j] > MMax){MMax = m[i][j];x=i;y=j;} if(m[i][j] < MMin){MMin = m[i][j];} } } } //Traceback & text alignment var i = x; var j = y; while (i>=2 || j>=2) { var Ai = m[i][0]; var Bj = m[0][j]; A = m[i-1][j-1] + f(Ai, Bj); B = m[i-1][j] + gap; C = m[i][j-1] + gap; if(i>=2 && j>=2 && m[i][j]==A) { AlignmentA = Ai + AlignmentA; AlignmentB = Bj + AlignmentB; if(Ai==Bj){ AlignmentM = '|' + AlignmentM; } else { AlignmentM = e + AlignmentM; } i = i - 1; j = j - 1; } else { if(i>=2 && m[i][j]==B) { AlignmentA = Ai + AlignmentA; AlignmentB = '-' + AlignmentB; AlignmentM = e + AlignmentM; i = i - 1; } else { AlignmentA = '-' + AlignmentA; AlignmentB = Bj + AlignmentB; AlignmentM = e + AlignmentM; j = j - 1; } } var r1 = i - 1; var r2 = j - 1; if(m[i][j]<=0){break;} } // LAYOUT var tM=''; var tS=''; // Check the end AlignmentA = AlignmentA + s0.substr(x-1, n_0 - x); AlignmentB = AlignmentB + s1.substr(y-1, n_1 - y); // Check the beginning AlignmentA = s0.substr(0, r1) + AlignmentA; AlignmentB = s1.substr(0, r2) + AlignmentB; if(r1>r2){ var v = r1 - r2; for(var u=1; u<=v; u++) {tS = tS + e;} for(var u=1; u<=v+r2; u++) {tM = tM + e;} AlignmentB = tS + AlignmentB; AlignmentM = tM + AlignmentM; } else { var v = r2 - r1; for(var u=1; u<=v; u++) {tS = tS + e;} for(var u=1; u<=v+r1; u++) {tM = tM + e;} AlignmentA = tS + AlignmentA; AlignmentM = tM + AlignmentM; } // Print the alignment document.write(AlignmentA + '<br>'); document.write(AlignmentM + '<br>'); document.write(AlignmentB + '<br>'); // Matching function function f(a1, a2) { if(a1 === a2){return Match;} else {return Mismatch;} }
 body { padding: 1rem; font-family: monospace; font-size: 18px; font-style: normal; font-variant: normal; line-height: 20px; }

The implementation also contains an important new addition, namely the positioning of the sequences relative to each other for a correct display of the alignment. The addition is explained below and more can be found here .

Once the "Run code snippet" button is pressed, the result of the above implementation is:

ATTGCGCGCAT
  |||
 ATGCTTAACCA

For more visit: https://github.com/gagniuc/Local-sequence-alignment-in-JS

The part of the implementation that is responsible for the trace back is shown below. One thing to observe is that there is no special traceback matrix.

    while (i>=2 || j>=2) {

    var Ai = m[i][0];
    var Bj = m[0][j];
    
    A = m[i-1][j-1] + f(Ai, Bj);
    B = m[i-1][j] + gap;
    C = m[i][j-1] + gap;

    if(i>=2 && j>=2 && m[i][j]==A) {

        AlignmentA = Ai + AlignmentA;
        AlignmentB = Bj + AlignmentB;
        
        if(Ai==Bj){
            AlignmentM = '|' + AlignmentM;
        } else {
            AlignmentM = e + AlignmentM;
        }
        
        i = i - 1;
        j = j - 1;

    } else {
        
        if(i>=2 && m[i][j]==B) {
            AlignmentA = Ai + AlignmentA;
            AlignmentB = '-' + AlignmentB;
            AlignmentM = e + AlignmentM;
            i = i - 1;

        } else {
            AlignmentA = '-' + AlignmentA;
            AlignmentB = Bj + AlignmentB;
            AlignmentM = e + AlignmentM;
            j = j - 1;
        }
    }
    
    var r1 = i - 1;
    var r2 = j - 1;

    if(m[i][j]<=0){break;}  
}

The book Algorithms in Bioinformatics: Theory and Implementation shows the correct implementation of the traceback in the chapter on sequence alignment , for both global and local sequence alignment:

追溯1

追溯 5

追溯2

追溯 3

追溯 4

Traceback rules. ( a ) shows the link between the implementation and the relative position of each element. ( b , c , d ) shows the first three iterations made by the traceback module in the global alignment case. ( e ) it shows the positions of the elements against which the equality is being verified. ( f ) show the complete traceback path and the two sequences aligned according to this path.

I also give a link to a HTML/JS/CSS implementation for local sequence alignment called Bio-Jupiter, that allows for experimentation regarding the trace back rules.The Bio-Jupiter application supports a method called forced alignment that allows a taceback from any element in the score matrix.

https://github.com/gagniuc/Jupiter-Bioinformatics-V2-dark

Live: Bio-Jupiter

木星生物信息学

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM