简体   繁体   中英

How to build the scoring matrix for global sequence alignment?

I have tried to get the global sequence alignment between two strings. But it gives me the wrong answer. My way of generating the scoring matrix as below.

public void makeScoringMatrix(String v,String w)
{
    int ar[][]=new int[v.length()+1][w.length()+1];
    for(int i=v.length()-1;i>=0;i--)
    {
        for(int j=w.length()-1;j>=0;j--)
        {
            if(v.charAt(i)==w.charAt(j))
                ar[i][j]=ar[i+1][j+1]+1;
            else if(v.charAt(i)!=w.charAt(j))
                ar[i][j]=ar[i+1][j+1]+0;
            else
                ar[i][j]=Math.max(ar[i][j+1],Math.max(ar[i+1][j],ar[i+1][j+1]));
        }
    }
    //printArray(ar);
    getGlobalAlignment(ar,v,w);
}

public void getGlobalAlignment(int ar[][],String v,String w)
{
    int i=0,j=0,index=0;
    while(i<v.length() && j<w.length())
    {
        if(v.charAt(i)==w.charAt(j))
        {
            System.out.print(v.charAt(i));
            i++;
            j++;
            index++;

        }
        else if(ar[i+1][j]>ar[i][j+1])
        {
            i++;
        }
        else
        {
            j++;
        }
    }

}

Your scoring matrix is incorrect. If you print the matrix you will see that it looks like this:

    A  T  C  A
A [3, 0, 0, 1, 0]
G [3, 0, 0, 1, 0]
C [3, 0, 0, 1, 0]
A [3, 0, 0, 1, 0]
  [3, 0, 0, 1, 0]

The problem is you are comparing v[i] to every w[j] when it should only be compared to at most 2 adjacent positions (i and i+1).

You will also notice that the last column is all 0s when it should be the first row and first column are considered to be the terminal value (which is why the matrix is length+1).

finally, I believe during the traceback for a global alignment, you should start at the final position in the matrix and walk backwards (hence the term trace- back . When you walk forward over your alignment you compare the sequence similarity in the sequence, not the scores in the matrix which I don't think is correct.

You should look at the wikipedia article on Needleman-Wunsch http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm or read one of the algorithm books; Durbin et al's Biological sequence analysis is the classic (but very hard to understand) book that covers pairwise alignments.

Try this code...

public void makeMatrix(String v,String w)
{
    int[][] maxDist=new int[v.length()+1][w.length()+1];
    for(int i=0;i<=v.length();i++)
    {
        for(int j=0;j<=w.length();j++)
        {
            if(i==0)
                maxDist[i][j]=-j;
            else if(j==0)
                maxDist[i][j]=-i;
            else
                maxDist[i][j]=0;
        }
    }
    fillMatrix(maxDist, v, w);
}

public int weight(String v,String w,int i,int j)
{
    if(v.charAt(i-1)==w.charAt(j-1))
        return 1;
    else
        return -1;
}

public void fillMatrix(int[][] ar,String v,String w)
{
    for(int i=1;i<=v.length();i++)
    {
        for(int j=1;j<=w.length();j++)
        {
            int scoreDiagonal=ar[i-1][j-1]+weight(v, w, i, j);
            int scoreLeft=ar[i][j-1]-1;
            int scoreUp=ar[i-1][j]-1;

            ar[i][j]=Math.max(scoreDiagonal, Math.max(scoreLeft, scoreUp));
        }
    }
}

Hope this is the code you are looking for...

Below you have a javascript implementation that computes the score matrix for global alignment. The entire alignment process was described in " Paul A. Gagniuc. Algorithms in Bioinformatics: Theory and Implementation. John Wiley & Sons, Hoboken, NJ, USA, 2021, ISBN: 9781119697961. " and is available here:

https://github.com/gagniuc/Local-sequence-alignment-in-JS

or here:

https://bcs.wiley.com/he-bcs/Books?action=index&itemId=1119697964&bcsId=12108

 // Variable statement var Match = +2; var Mismatch = -1; var gap = -2; var s0 = 'AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTT'; var s1 = 'GAAATGATCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCTAACCCC'; var m = []; var s = []; // Matrix initialization and completion s[0] = [] = s0.split(''); s[1] = [] = s1.split(''); var n_0 = s[0].length + 1; var n_1 = s[1].length + 1; for(var i=0; i<=n_0; i++) { m[i]=[]; for(var j=0; j<=n_1; j++) { m[i][j]=0; if (i==1 && j>1) {m[i][j]=m[i][j-1]+gap;} if (j==1 && i>1) {m[i][j]=m[i-1][j]+gap;} if (i>1) {m[i][0]=s[0][i-2];} if (j>1) {m[0][j]=s[1][j-2];} if(i>1 && j>1){ var A = m[i-1][j-1] + f(m[i][0],m[0][j]); //'\\ var B = m[i-1][j] + gap; //'- var C = m[i][j-1] + gap; //'| m[i][j] = Math.max(A, B, C); } } } document.write('Score matrix:'+SMC(m)); // Matching function function f(a1, a2) { if(a1 === a2){return Match;} else {return Mismatch;} } // SHOW MATRIX CONTENT function SMC(m) { var r = "<table border=1>"; for(var i=0; i<m.length; i++) { r += "<tr>"; for(var j=0; j<m[i].length; j++){ r += "<td>"+m[i][j]+"</td>"; } r += "</tr>"; } r += "</table>"; return r; }
 body { padding: 1rem; font-family: monospace; font-size: 18px; font-style: normal; font-variant: normal; line-height: 20px; }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM