简体   繁体   中英

global sequence alignment dynamic programming finding the minimum in a matrix

I have 2 sequences, AACAGTTACC<\/code> and TAAGGTCA<\/code> , and I'm trying to find a global sequence alignment. I managed to create a 2D array and create the matrix, and I even filled it with semi-dynamic approach.

void process() {
    for (int i = 1; i <= sequenceA.length; i++) {
        for (int j = 1; j <= sequenceB.length; j++) {
            int scoreDiag = opt[i-1][j-1] + equal(i, j);
            int scoreLeft = opt[i][j-1] - 1;
            int scoreUp = opt[i-1][j] - 1;
            opt[i][j] = Math.max(Math.max(scoreDiag, scoreLeft), scoreUp);
        }
    }
}

private int equal(int i, int j) {
    if (sequenceA[i - 1] == sequenceB[j - 1]) {
        return 1;
    } else {
        return -1;
    }
}

There are several things that you need to modify:

  1. Note that in the image you give us the alignment goes from the bottom-right corner to the top-left corner. So in that image they are not really aligning AACAGTTACC and TAAGGTCA , but CCATTGACAA and ACTGGAAT .
  2. You say that you want a global alignment , but you actually compute a local alignment . The main difference is in penalties at the beginning of the sequences. In a global alignment you have to compute insertions and deletions at the first row and columns.
  3. Third, you are not applying correctly the penalties you mention. Instead, you always penalize with -1 and reward with +1.
  4. In the example image they are not taking the maximum value at each position, but the minimum (this is because your penalties are positive and the rewards is 0, not the other way around, so you want to minimize the values).

The full solution is:

// Note that these sequences are reversed!
String sequenceA ="CCATTGACAA";
String sequenceB = "ACTGGAAT";

// The penalties to apply
int gap = 2, substitution = 1, match = 0;

int[][] opt = new int[sequenceA.length() + 1][sequenceB.length() + 1];

// First of all, compute insertions and deletions at 1st row/column
for (int i = 1; i <= sequenceA.length(); i++)
    opt[i][0] = opt[i - 1][0] + gap;
for (int j = 1; j <= sequenceB.length(); j++)
    opt[0][j] = opt[0][j - 1] + gap;

for (int i = 1; i <= sequenceA.length(); i++) {
    for (int j = 1; j <= sequenceB.length(); j++) {
        int scoreDiag = opt[i - 1][j - 1] +
                (sequenceA.charAt(i-1) == sequenceB.charAt(j-1) ?
                    match : // same symbol
                    substitution); // different symbol
        int scoreLeft = opt[i][j - 1] + gap; // insertion
        int scoreUp = opt[i - 1][j] + gap; // deletion
        // we take the minimum
        opt[i][j] = Math.min(Math.min(scoreDiag, scoreLeft), scoreUp);
    }
}

for (int i = 0; i <= sequenceA.length(); i++) {
    for (int j = 0; j <= sequenceB.length(); j++)
        System.out.print(opt[i][j] + "\t");
    System.out.println();
}

The result is just as in the example you gave us (but reversed, remember!):

0   2   4   6   8   10  12  14  16  
2   1   2   4   6   8   10  12  14  
4   3   1   3   5   7   9   11  13  
6   4   3   2   4   6   7   9   11  
8   6   5   3   3   5   7   8   9   
10  8   7   5   4   4   6   8   8   
12  10  9   7   5   4   5   7   9   
14  12  11  9   7   6   4   5   7   
16  14  12  11  9   8   6   5   6   
18  16  14  13  11  10  8   6   6   
20  18  16  15  13  12  10  8   7

So the final alignment score is found at opt[sequenceA.length()][sequenceB.length()] (7). If you really need to show the reversed matrix as in the image, do this:

for (int i = sequenceA.length(); i >=0; i--) {
    for (int j = sequenceB.length(); j >= 0 ; j--)
        System.out.print(opt[i][j] + "\t");
    System.out.println();
}

The current example shows how to find the minimum and maximum values above the score matrix. Below you have a javascript implementation that computes the score matrix for global alignment, but also the minimum and maximum values. The entire alignment process was described in " Paul A. Gagniuc. Algorithms in Bioinformatics: Theory and Implementation. John Wiley & Sons, Hoboken, NJ, USA, 2021, ISBN: 9781119697961. " and is available here:

https://github.com/gagniuc/Local-sequence-alignment-in-JS

or here:

https://bcs.wiley.com/he-bcs/Books?action=index&itemId=1119697964&bcsId=12108

 // Variable statement var Match = +2; var Mismatch = -1; var gap = -2; var s0 = 'AACAGTTACC'; var s1 = 'TAAGGTCA'; var MMax = 0; var MMin = 0; var m = []; var s = []; // Matrix initialization and completion s[0] = [] = s0.split(''); s[1] = [] = s1.split(''); var n_0 = s[0].length + 1; var n_1 = s[1].length + 1; for(var i=0; i<=n_0; i++) { m[i]=[]; for(var j=0; j<=n_1; j++) { m[i][j]=0; if (i==1 && j>1) {m[i][j]=m[i][j-1]+gap;} if (j==1 && i>1) {m[i][j]=m[i-1][j]+gap;} if (i>1) {m[i][0]=s[0][i-2];} if (j>1) {m[0][j]=s[1][j-2];} if(i>1 && j>1){ var A = m[i-1][j-1] + f(m[i][0],m[0][j]); //'\\ var B = m[i-1][j] + gap; //'- var C = m[i][j-1] + gap; //'| m[i][j] = Math.max(A, B, C); if(m[i][j] > MMax){MMax = m[i][j];x=i;y=j;} if(m[i][j] < MMin){MMin = m[i][j];} } } } document.write('Max:'+MMax+'<br>'); document.write('Min:'+MMin+'<hr>'); document.write('Score matrix:'+SMC(m)); // Matching function function f(a1, a2) { if(a1 === a2){return Match;} else {return Mismatch;} } // SHOW MATRIX CONTENT function SMC(m) { var r = "<table border=1>"; for(var i=0; i<m.length; i++) { r += "<tr>"; for(var j=0; j<m[i].length; j++){ r += "<td>"+m[i][j]+"</td>"; } r += "</tr>"; } r += "</table>"; return r; }
 body { padding: 1rem; font-family: monospace; font-size: 18px; font-style: normal; font-variant: normal; line-height: 20px; }

分数矩阵上的最大值和最小值

Note that the implementation uses the two DNA sequences indicated by you, namely s1:AACAGTTACC, and s2:TAAGGTCA.

Have a look at http://en.wikipedia.org/wiki/Longest_common_substring , the code is pretty much copy-paste for several languages, and is easily adapted to also tell you the alignment index. I had to do a similar thing and ended up with https://github.com/Pomax/DOM-diff/blob/rewrite/rewrite/rewrite.html#L103

(The SubsetMapping it returns is basically a simple struct that gives the index for both contexts, https://github.com/Pomax/DOM-diff/blob/rewrite/rewrite/rewrite.html#L52 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM