I have 96 amino acid sequences which I aligned with MAFFT and trimmed manually (FASTA format), choose the model of amino acid substitution with ProtTest (LG+I+G model), did the phylogenetic reconstruction with MEGAX (ML method, bootstrap test 1000 replicates, tree in Newick format) and the ancestral reconstruction with PAML, in a total of 664 final amino acid positions. However, my alignment has indels. I am naming each indel with a letter (A to T) and the respective amido acid positions range: A:89-92, B:66-67, C:181-186, D:208-208, E:214-219, F:244-250, G:237-296, H:278-280, I:295-295, J:329-334, K:345-349, L:371-375, M:390-425, N:432-433, O:440-443, P:480-480, Q:500-500, R:541-544, S:600-600. Both the initial and final parts of the sequences is very variable, so from positions 0 to 34 (initial) and 600 to 664 (final), each amino acid position may represent an indel.
I want to know, at each ancestral node, what is the probability that each indel is present in the ancestral sequence. I was told that the R-studio "ace" function on the package "ape - analysis of phylogenetics and evolution" can perform this task. I have installed both "ape" and "ggtree". I checked this webpage https://www.rdocumentation.org/packages/ape/versions/3.0-1/topics/ace , however, I have no idea how to construct the script. I am a biologist and newbie to R.
Can someone please help? Would be greatly appreciated, thanks.
It's hard to exactly figure out what you'll need from your example but the following could fit the general idea:
R
For this step you can use the functions read.tree
or read.nexus
depending on your tree format: ie whether your phylogenetic software outputs a NEXUS file (usually the first line in these files is #NEXUS
and the last line is end;
or END;
) or a newick output (usually, the first line directly starts with the phylogeny like ((my_species...
and finishes with ;
). You can locate this file and then read it in R using:
## Loading the package
library(ape)
## Reading the tree
my_tree <- read.tree("<the_path_to_your_file>")
R
You will then need to load your trait data (for example the indels positions you've listed above) as a matrix
or a data.frame
. The easiest is to have them in a .csv
format ("comma separated values") that you can then read in R
using the function read.csv
:
## Reading the variables as a matrix
my_variables <- read.csv("<the_path_to_your_file>")
And finally you can run your ancestral character estimation for each of your variable using the ace
function from the package ape
:
## Selecting the variable of interest (e.g. the first column of the dataset)
one_variable <- my_variables[, 1]
## Running the ancestral character estimation for this variable
my_ace <- ace(x = one_variable, phy = my_tree, type = "discrete")
## Looking at the results
my_ace
Of course there is much more to it but hopefully this could get you starting.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.