简体   繁体   中英

string alignment inspired by global DNA sequence allignment

I would like to do something similar to this:

library(Biostrings)
s1 <-DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAGAAGACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATGTTTCACTACTTCCTTTCGGGTAAGTGTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAATATATAAAAATATAATTTTCATCAAATATATAAAAATATAATTTTCATC")
pairwiseAlignment(s1,s2)

but with strings like this:

x123 x4531 etc.

instead of DNA alphabet characters. Is anyone aware of a package to achieve this in R or even Python. Thanks!

Biopython's Align module can accept an alphabet of your choice, eg

>>> from Bio import Align
>>> aligner = Align.PairwiseAligner()
>>> aligner.mode = "global"
>>> aligner.alphabet
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> aligner.alphabet += "1234567890"
>>> alignments = aligner.align("X123Y", "B12XYXYXYX")
>>> print(alignments[0])
X-123-Y-----
--||--|-----
-B12-XYXYXYX

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM