简体   繁体   中英

How can I fit my data and question in a script for the ace function of the ape pakage in Rstudio?

I have 96 amino acid sequences which I aligned with MAFFT and trimmed manually (FASTA format), choose the model of amino acid substitution with ProtTest (LG+I+G model), did the phylogenetic reconstruction with MEGAX (ML method, bootstrap test 1000 replicates, tree in Newick format) and the ancestral reconstruction with PAML, in a total of 664 final amino acid positions. However, my alignment has indels. I am naming each indel with a letter (A to T) and the respective amido acid positions range: A:89-92, B:66-67, C:181-186, D:208-208, E:214-219, F:244-250, G:237-296, H:278-280, I:295-295, J:329-334, K:345-349, L:371-375, M:390-425, N:432-433, O:440-443, P:480-480, Q:500-500, R:541-544, S:600-600. Both the initial and final parts of the sequences is very variable, so from positions 0 to 34 (initial) and 600 to 664 (final), each amino acid position may represent an indel.

I want to know, at each ancestral node, what is the probability that each indel is present in the ancestral sequence. I was told that the R-studio "ace" function on the package "ape - analysis of phylogenetics and evolution" can perform this task. I have installed both "ape" and "ggtree". I checked this webpage https://www.rdocumentation.org/packages/ape/versions/3.0-1/topics/ace , however, I have no idea how to construct the script. I am a biologist and newbie to R.

Can someone please help? Would be greatly appreciated, thanks.

It's hard to exactly figure out what you'll need from your example but the following could fit the general idea:

1 - Load your tree in R

For this step you can use the functions read.tree or read.nexus depending on your tree format: ie whether your phylogenetic software outputs a NEXUS file (usually the first line in these files is #NEXUS and the last line is end; or END; ) or a newick output (usually, the first line directly starts with the phylogeny like ((my_species... and finishes with ; ). You can locate this file and then read it in R using:

## Loading the package
library(ape)
## Reading the tree
my_tree <- read.tree("<the_path_to_your_file>")

2 - Load your trait data in R

You will then need to load your trait data (for example the indels positions you've listed above) as a matrix or a data.frame . The easiest is to have them in a .csv format ("comma separated values") that you can then read in R using the function read.csv :

## Reading the variables as a matrix
my_variables <- read.csv("<the_path_to_your_file>")

3 - Running an ancestral character estimation

And finally you can run your ancestral character estimation for each of your variable using the ace function from the package ape :

## Selecting the variable of interest (e.g. the first column of the dataset)
one_variable <- my_variables[, 1]
## Running the ancestral character estimation for this variable
my_ace <- ace(x = one_variable, phy = my_tree, type = "discrete")
## Looking at the results
my_ace

Of course there is much more to it but hopefully this could get you starting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM