简体   繁体   中英

Creating a new column in R with help of 3 existing columns

Want to create a new column "non_coded" using existing 3 columns- allele_2 , allele_1 and A1

the conditions I want satisfied are :

if allele_2 == A1 then non_coded = allele_1 

if allele_2 != A1 then non_coded = allele_2

Thanks in advance,

Rad

OK This is what the data looks like:

SNPID          chrom STRAND IMPUTED allele_2 allele_1     MAF CALL_RATE HET_RATE
1  rs1000000    12      +       Y        A        G 0.12160   1.00000   0.2146
2 rs10000009     4      +       Y        G        A 0.07888   0.99762   0.1386

     HWP    RSQ  PHYS_POS A1 M1_FRQ M1_INFO M1_BETA  M1_SE    M1_P
1 1.0000 0.9817 125456933  A 0.1173  0.9452 -0.0113 0.0528 0.83090
2 0.1164 0.8354  71083542  A 0.9048  0.9017 -0.0097 0.0593 0.87000

The code I tried:

Hy_MVA$non_coded <- ifelse(Hy_MVA$allele_2 == Hy_MVA$A1, Hy_MVA$allele_1, Hy_MVA$allele_2)

result:

 SNPID       chrom STRAND IMPUTED allele_2 allele_1     MAF CALL_RATE HET_RATE
1  rs1000000    12    +       Y        A        G 0.12160   1.00000   0.2146
2 rs10000009     4    +       Y        G        A 0.07888   0.99762   0.1386

     HWP    RSQ  PHYS_POS A1 M1_FRQ M1_INFO M1_BETA  M1_SE    M1_P non_coded
1 1.0000 0.9817 125456933  A 0.1173  0.9452 -0.0113 0.0528 0.83090         3
2 0.1164 0.8354  71083542  A 0.9048  0.9017 -0.0097 0.0593 0.87000         3

What I want:

SNPID        chrom STRAND IMPUTED allele_2 allele_1     MAF CALL_RATE HET_RATE
1  rs1000000    12    +       Y        A        G 0.12160   1.00000   0.2146
2 rs10000009     4    +       Y        G        A 0.07888   0.99762   0.1386

     HWP    RSQ  PHYS_POS A1 M1_FRQ M1_INFO M1_BETA  M1_SE    M1_P non_coded
1 1.0000 0.9817 125456933  A 0.1173  0.9452 -0.0113 0.0528 0.83090         G
2 0.1164 0.8354  71083542  A 0.9048  0.9017 -0.0097 0.0593 0.87000         G

As Chase said, use ifelse() . I guess the code then becomes:

non_coded <- ifelse(allele_2 == A1, allele_1, allele_2)

Edit

After seeing the updated question, it makes sense that you get numbers because allele_1 and allele_2 are factors. Adding a as.character() should fix this:

A1 <- c("A","A","B")
allele_1 <- as.factor(c("A","C","C"))
allele_2 <- as.factor(c("A","B","B"))

non_coded <- ifelse(allele_2 == A1, as.character(allele_1), as.character(allele_2))
non_coded 
[1] "A" "B" "C"

Since you want non_coded to be one of two values:

Hy_MVA$non_coded <- Hy_MVA$allele_2
Hy_MVA$non_coded[Hy_MVA$allele_2 == Hy_MVA$A1] <- Hy_MVA$allele_1[Hy_MVA$allele_2 == Hy_MVA$A1]

That replaces values with allele_1 values in only the rows where allele_2 == A1. It sounds as though you might have a problem with ifelse converting a factor to a numeric.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM