简体繁体中英

How to align read to two SHORT reference sequences and see percentage that mapped to one or the other reference?

原文 2022-12-20 23:18:31 2 1 bioinformatics/ fastq/ sequence-alignment

I have PCR-Amplified fastq files of a specific target region from several samples. For each sample, I want to know the percentage of reads that align better to reference sequence #1 or #2 posted below. How should I begin to tackle this question and what tool for alignment is best?

I am working with Illumina paired-end adapter sequences spiked-in on a 2X150 run. The two reference amplicons are 173 and 179 bp:

1: aaaaagtataaatataggaccaggcagagcattttatacaacaggagaaataataggagatataagacaagcacattgtaaccttagtagagcaaaatggaatgacactttaaataagatagttataaaattaagagaacaatttgggaataaaacaatagtctttaagcact

2: aaaaagtatccgtatccagaggggaccagggagagcatttgttacaataggaaaaataggaaatatgagacaagcacattgtaacattagtagagcaaaatggaatgccactttaaaacagatagctagcaaattaagagaacaatttggaaataataaaacaataatctttaagcaat

We want to know if one virus wins over another after infection infection based off of the differences between these two sequences; so essentially the percentage that align best to #1 and the percentage that align best to #2.

Thank You,

Sara

1 answers

Convert your reference amplicons to fasta format.
Choose an aligner, such as bwa mem , bowtie2 , etc.
Index the reference for your aligner of choice.
Align the reads to the reference using your aligner of choice.
Use samtools idxstats to find the number of reads aligned to each of the amplicons.

Notes:

It is often a good idea to trim adapters from the reads before you align the reads. A number of good adapter trimmers exist, such as flexbar , skewer , etc.
Many popular bioinformatics packages mentioned above can be easily installed, for example using conda .

REFERENCES:

conda
bwa
bowtie2
samtools
flexbar
skewer

Python: How to find coordinates of short sequences in a FASTA file?

Delete lines shorter than a certain length and the one above it (remove short sequences in a FASTA file)

How do I calculate percentage amino acid composition of sequences contained in a large FASTA file

How to read reference line (start with RN,RT,RA,RC,RX,RP,RL) and print all

Identifying mutations between two sequences

Find the most freq sequence amongst other sequences

Blast Two sequences from a python script

How to translate multiple fasta sequences?

How to replace complementary SNP call to original reference/alternate SNP call?

Complexity of computing the similarity between two sequences

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python: How to find coordinates of short sequences in a FASTA file? Delete lines shorter than a certain length and the one above it (remove short sequences in a FASTA file) How do I calculate percentage amino acid composition of sequences contained in a large FASTA file How to read reference line (start with RN,RT,RA,RC,RX,RP,RL) and print all Identifying mutations between two sequences Find the most freq sequence amongst other sequences Blast Two sequences from a python script How to translate multiple fasta sequences? How to replace complementary SNP call to original reference/alternate SNP call? Complexity of computing the similarity between two sequences

Related Tags

How to align read to two SHORT reference sequences and see percentage that mapped to one or the other reference?

Question

1 answers

solution1 0 2022-12-21 16:43:03

solution1
0 2022-12-21 16:43:03