So I have a massive file looking like this:
ID SNP A1 A2
104 sr_1 A G
104 sr_2 C C
104 sr_3 C A
105 sr_1 A A
105 sr_2 C G
105 sr_3 C C
106 sr_1 A A
106 sr_2 C C
106 sr_3 C C
. . . .
. . . .
. . . .
What I want to do is to change all "G" in sr_1 matching rows, by "A", so that if there is a G in A2 column in any sr_1 rows, it can be changed by an A.
so that the results would be:
ID SNP A1 A2
104 sr_1 A A
104 sr_2 C C
104 sr_3 C A
105 sr_1 A A
105 sr_2 C G
105 sr_3 C C
106 sr_1 A A
106 sr_2 C C
106 sr_3 C C
. . . .
. . . .
. . . .
I have many many rows with sr_1 incorrect A2 values. I have triend some VLOOKUP options in excel/libreoffice and some functions transposing the table in R, but I cannot find a good solution...
Any help?
If you're using excel, in a new column next to A2, try using the below formula and filling down:
=IF(AND(B2="sr_1",D2="G"),"A",D2)
If SNP column is sr_1 and the A2 column is G, it will return A, otherwise it will return the value in A2. Then copy the filled down column values over the A2 column to update
In base R you could use simple subsetting rules with [
:
#subset A2 where snp is sr_1 and A2 is G/ Then replace A2 by A
df$A2[df$SNP == 'sr_1' & df$A2 == 'G'] <- 'A'
df
# ID SNP A1 A2
#1 104 sr_1 A A
#2 104 sr_2 C C
#3 104 sr_3 C A
#4 105 sr_1 A A
#5 105 sr_2 C G
#6 105 sr_3 C C
#7 106 sr_1 A A
#8 106 sr_2 C C
#9 106 sr_3 C C
You can use sqldf
package and using update
in sql likes the following:
require(sqldf)
sql1 <- fn$identity("UPDATE df SET A2 = 'A' WHERE A2 = 'G' AND SNP = 'sr_1'")
sql2 <- "select * from df"
sqldf(c(sql1, sql2))
try this formula in an empty column, starting in row 2
=if(and(B2="sr_1",D2="G"),"A",D2)
Copy down. Then copy the result and paste as values over column D
You didnt specify how you want to accmplush that. Is it through a formula? Or through a macro code?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.