简体   繁体   中英

Loop through columns in S4 objects in R

I am trying to perform an association using the snpStats package.

I have a snp matrix called 'plink' which contains my genotype data (as a list of $genotypes, $map, $fam), and plink$genotype has: SNP names as column names (2 SNPs) and the subject identifiers as the row names:

plink$genotype
SnpMatrix with  6 rows and  2 columns
Row names:  1 ... 6 
Col names:  203 204

The plink dataset can be reproduced copying the following ped and map files and saving them as 'plink.ped' and plink.map' respectively:

plink.ped:

1 1 0 0 1 -9 A A G G
2 2 0 0 2 -9 G A G G
3 3 0 0 1 -9 A A G G
4 4 0 0 1 -9 A A G G
5 5 0 0 1 -9 A A G G
6 6 0 0 2 -9 G A G G

plink.map:

1 203 0 792429
2 204 0 819185

And then use plink in this way:

./plink --file plink --make-bed

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Web-based version check ( --noweb to skip )
Recent cached web-check found...Problem connecting to web

Writing this text to log file [ plink.log ]
Analysis started: Tue Nov 29 18:08:18 2011

Options in effect:
--file /ugi/home/claudiagiambartolomei/Desktop/plink
--make-bed

 2 (of 2) markers to be included from [ /ugi/home/claudiagiambartolomei/Desktop   /plink.map ]
 6 individuals read from [ /ugi/home/claudiagiambartolomei/Desktop/plink.ped ] 
 0 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 0 controls and 6 missing
4 males, 2 females, and 0 of unspecified sex
Before frequency and genotyping pruning, there are 2 SNPs
6 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 1
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 2 SNPs
After filtering, 0 cases, 0 controls and 6 missing
After filtering, 4 males, 2 females, and 0 of unspecified sex
Writing pedigree information to [ plink.fam ] 
Writing map (extended format) information to [ plink.bim ] 
Writing genotype bitfile to [ plink.bed ] 
Using (default) SNP-major mode

Analysis finished: Tue Nov 29 18:08:18 2011

I also have a phenotype data frame which contains the outcomes (outcome1, outcome2,...) I would like to associate with the genotype, which is this:

ID<- 1:6
sex<- rep(1,6)
age<- c(59,60,54,48,46,50)
bmi<- c(26,28,22,20,23, NA)
ldl<- c(5, 3, 5, 4, 2, NA)
pheno<- data.frame(ID,sex,age,bmi,ldl)

The association works for the single terms when I do this: (using the formula "snp.rhs.test"):

bmi<-snp.rhs.tests(bmi~sex+age,family="gaussian", data=pheno, snp.data=plink$genotype)

My question is, how do I loop through the outcomes? This type of data seems different from all the others and I am having trouble manipulating it, so I would also be grateful if you have suggestions of some tutorials that can help me understand how to do this and other manipulations such as subsetting the snp.matrix data for example.

This is what I have tried for the loop:

rhs <- function(x) { 
x<- snp.rhs.tests(x, family="gaussian", data=pheno, 
snp.data=plink$genotype) 
} 
res_ <- apply(pheno,2,rhs) 

Error in x$terms : $ operator is invalid for atomic vectors

Then I tried this:

for (cov in names(pheno)) { 
 association<-snp.rhs.tests(cov, family="gaussian",data=pheno, snp.data=plink$genotype) 
 } 

Error in eval(expr, envir, enclos) : object 'bmi' not found

Thank you as usual for your help! -f

The author of snpStats is David Clayton. Although the website listed in the package description is wrong, he is still at that domain and it's possible to do a search for documentation with the advanced search feature of Google with this specification:

snpStats site:https://www-gene.cimr.cam.ac.uk/staff/clayton/

The likely reason for your difficulty with access is that this is an S4 package and the methods for access are different. Instead of print methods S4 objects typically have show-methods. There is a vignette on the package here: https://www-gene.cimr.cam.ac.uk/staff/clayton/courses/florence11/practicals/practical6.pdf , and the directory for his entire short course is open for access: https://www-gene.cimr.cam.ac.uk/staff/clayton/courses/florence11/

It becomes clear that the object returned from snp.rhs.tests can be accessed with "[" using sequential numbers or names as illustrated on p 7. You can get the names :

# Using the example on the help(snp.rhs.tests) page:

> names(slt3)
 [1] "173760" "173761" "173762" "173767" "173769" "173770" "173772" "173774"
 [9] "173775" "173776"

The things you may be calling columns are probably "slots"

> getSlots(class(slt3))
  snp.names   var.names       chisq          df           N 
      "ANY" "character"   "numeric"   "integer"   "integer" 
> str(getSlots(class(slt3)))
 Named chr [1:5] "ANY" "character" "numeric" "integer" "integer"
 - attr(*, "names")= chr [1:5] "snp.names" "var.names" "chisq" "df" ...
> names(getSlots(class(slt3)))
[1] "snp.names" "var.names" "chisq"     "df"        "N"        

But there is no [i,j] method for looping over those slot names. You should instead go to the help page ?"GlmTests-class" which lists the methods defined for that S4 class.

The correct way to do what the initial poster required is:

for (i in ncol(pheno)) { 
  association <- snp.rhs.tests(pheno[,i], family="gaussian", snp.data=plink$genotype) 
}

The documentation of snp.rhs.tests() says that if data is missing, the phenotype is taken from the parent frame - or maybe it was worded in the opposite sense: if data is specified, the phenotype is evaluated in the specified data.frame .

This is a clearer version:

for (i in ncol(pheno)) {
  cc <-  pheno[,i]
  association <- snp.rhs.tests(cc, family="gaussian", snp.data=plink$genotype) 
}

The documentation says data=parent.frame() is the default in snp.rhs.tests() .

There is a glaring error in the apply() code - Please do not do x <- some.fun(x) , as it does very bad things. Try this instead - drop the data= , and use a different variable name.

rhs <- function(x) { 
y<- snp.rhs.tests(x, family="gaussian", snp.data=plink$genotype) 
} 
res_ <- apply(pheno,2,rhs)

Also the initial poster's question is misleading.

plink$genotype is an S4 object, pheno is a data.frame (an S3 object). You really just want to select columns in a S3 data.frame, but you are thrown off course by how snp.rhs.tests() looks for the columns (if a data.frame is given) or a vector phenotype (if it is given as a plain vector - ie in the parent frame, or your "current" frame, since the subroutine is evaluated in a "child" frame!)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM