How can I fit my data and question in a script for the ace function of the ape pakage in Rstudio?

Question

I have 96 amino acid sequences which I aligned with MAFFT and trimmed manually (FASTA format), choose the model of amino acid substitution with ProtTest (LG+I+G model), did the phylogenetic reconstruction with MEGAX (ML method, bootstrap test 1000 replicates, tree in Newick format) and the ancestral reconstruction with PAML, in a total of 664 final amino acid positions. However, my alignment has indels. I am naming each indel with a letter (A to T) and the respective amido acid positions range: A:89-92, B:66-67, C:181-186, D:208-208, E:214-219, F:244-250, G:237-296, H:278-280, I:295-295, J:329-334, K:345-349, L:371-375, M:390-425, N:432-433, O:440-443, P:480-480, Q:500-500, R:541-544, S:600-600. Both the initial and final parts of the sequences is very variable, so from positions 0 to 34 (initial) and 600 to 664 (final), each amino acid position may represent an indel.

I want to know, at each ancestral node, what is the probability that each indel is present in the ancestral sequence. I was told that the R-studio "ace" function on the package "ape - analysis of phylogenetics and evolution" can perform this task. I have installed both "ape" and "ggtree". I checked this webpage https://www.rdocumentation.org/packages/ape/versions/3.0-1/topics/ace , however, I have no idea how to construct the script. I am a biologist and newbie to R.

Can someone please help? Would be greatly appreciated, thanks.

Answer 1

It's hard to exactly figure out what you'll need from your example but the following could fit the general idea:

1 - Load your tree in `R`

For this step you can use the functions read.tree or read.nexus depending on your tree format: ie whether your phylogenetic software outputs a NEXUS file (usually the first line in these files is #NEXUS and the last line is end; or END; ) or a newick output (usually, the first line directly starts with the phylogeny like ((my_species... and finishes with ; ). You can locate this file and then read it in R using:

## Loading the package
library(ape)
## Reading the tree
my_tree <- read.tree("<the_path_to_your_file>")

2 - Load your trait data in `R`

You will then need to load your trait data (for example the indels positions you've listed above) as a matrix or a data.frame . The easiest is to have them in a .csv format ("comma separated values") that you can then read in R using the function read.csv :

## Reading the variables as a matrix
my_variables <- read.csv("<the_path_to_your_file>")

3 - Running an ancestral character estimation

And finally you can run your ancestral character estimation for each of your variable using the ace function from the package ape :

## Selecting the variable of interest (e.g. the first column of the dataset)
one_variable <- my_variables[, 1]
## Running the ancestral character estimation for this variable
my_ace <- ace(x = one_variable, phy = my_tree, type = "discrete")
## Looking at the results
my_ace

Of course there is much more to it but hopefully this could get you starting.

How can I fit my data and question in a script for the ace function of the ape pakage in Rstudio?

Question

1 answers

solution1
0 2021-07-09 11:07:15

1 - Load your tree in `R`

2 - Load your trait data in `R`

3 - Running an ancestral character estimation

How can I fit my data and question in a script for the ace function of the ape pakage in Rstudio?

Question

1 answers

solution1 0 2021-07-09 11:07:15

1 - Load your tree in R

2 - Load your trait data in R

3 - Running an ancestral character estimation

solution1
0 2021-07-09 11:07:15

1 - Load your tree in `R`

2 - Load your trait data in `R`