简体   繁体   中英

R: How to recode values based on first characters 0|0 vs 1|0 vs 0|1 vs 1|1

Thank you in advance for the help.

I am trying to recode a genetic database that contains genotypes coded in VCF format. For context, the VCF format is coded in this format: '0|0:0,0:0:1,0,0'. The main thing I am interested in is the first two(/three if including the |) characters: 0|0 :0,0:0:1,0,0. If these are 0|0, it means that the person has two dominant alleles. IF these are 1|1, two recessive alleles. 1|0 and 0|1 are a mix of the two.

I am working on a data frame called "gg" that contains approx 120 columns (one for each SNP) and 1500 rows (one for each subject in the study).

I am trying to recode the SNP from its current format to a more easily analysable format:

  • 0|0 = two dominant alleles - recode as 0
  • 0|1 or 1|0 = mix of one dominant one recessive - recode as 1
  • 1|1= two recessive - recode as 2

I have attempted several approaches. The latest thing I have attempted has got close-ish. I tried the following:

gg[grep("0|0", gg)] <- "0"

Weirdly this makes all the values for the WHOLE database 0's. I think this is because it is interpreting the 0|0 as 'if the value contains a zero or a zero, recode as zero' (and all values contain at least one zero).

What I want to convey is to recode as 1 if the value starts with the EXACT characters 0|0, recode as 1 if it starts with the EXACT characters of 0|1 or 1|0, recode as 2 if it starts with the EXACT character of 1|1

Try the code below

colSums(list2DF(strsplit(substr(gsub("\\|","",gg),1,2),""))=="1")

which gives

0 1 1 2

Dummy Data

gg <- c('0|0:0,0:0:1,0,0','10:0,0:0:1,0,0','0|1:0,0:0:1,0,0','11:0,0:0:1,0,0')

A slightly modified option is

rowSums(read.csv(text = sub("^(\\d)\\|?(\\d).*", "\\1,\\2", gg), 
         header = FALSE) == 1)
#[1] 0 1 1 2

data

gg <- c('0|0:0,0:0:1,0,0','10:0,0:0:1,0,0','0|1:0,0:0:1,0,0','11:0,0:0:1,0,0')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM