Let's say I have the following data:
SNP eff_allele A1 A2
rs1000000 A A G
rs10000010 C C T
rs1000002 T T C
rs10000023 G T G
I want to create a new variable, alt_allele, that takes on the value of either column A1 or A2, depending on the value of the column eff_allele. If eff_allele equals A1, then alt_allele should get the value of A2, and if eff_allele equals A2, then alt_allele should get the value of A1. I did two attempts:
Attempt 1:
if (myData$eff_allele == myData$A1) {
myData$alt_allele <- myData$A2
}
if (myData$eff_allele == myData$A2) {
myData$alt_allele <- myData$A1
}
Attempt 2:
height_fam$alt_allele[height_fam$eff_allele == height_fam$A1] <- height_fam$A2
height_fam$alt_allele[height_fam$eff_allele == height_fam$A2] <- height_fam$A1
Both of these don't work... What am I doing wrong? How can I achieve the following update to my data:
SNP eff_allele A1 A2 alt_allele
rs1000000 A A G G
rs10000010 C C T T
rs1000002 T T C C
rs10000023 G T G T
In R
and matlab
try not too use loops, They are slow. try solve your problem by vector s.
Edit : Oh, I read your question wrong, You didn't use vectors anyway :)
a=read.table("a.csv", sep = " ", header = T)
row = dim(a)
# Number of rows
row = row[2]
newcol = rep("",row)
A1 = as.character(a$A1)
A2 = as.character(a$A2)
eff_allele = as.character(a$eff_allele)
# a1_ind is FALSE for index that should be equal to A1
a1_ind = eff_allele!= A1
newcol[a1_ind] = A1[a1_ind]
newcol[!a1_ind] = A2[!a1_ind]
a = cbind(a,newcol)
and the output will be:
SNP eff_allele A1 A2 newcol
1 rs1000000 A A G G
2 rs10000010 C C T T
3 rs1000002 T T C C
4 rs10000023 G T G T
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.