简体   繁体   中英

Multiply values in a dataset by values in another dataset in R

I have two datasets which both share a common ID variable, and also share n variables which are denoted SNP1-SNP n . An example of the two datasets is shown below

Dataset 1

ID SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7
1   0    1    1    0    0    0    0
2   1    1    0    0    0    0    0
3   1    0    0    0    1    1    0
4   0    1    1    0    0    0    0
5   1    0    0    0    1    1    0
6   1    0    0    0    1    1    0
7   0    1    1    0    0    0    0

Dataset 2

ID SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7
1  0.65 1.3  2.8  0.43 0.62 0.9  1.5
2  0.74 1.6  3.4  0.9  2.4  4.4  2.3
3  0.28 0.5  5.7  6.7  0.3  2.5  0.56
4  0.74 1.6  3.4  0.9  2.4  4.4  2.3
5  0.65 1.3  2.8  0.43 0.62 0.9  1.5
6  0.74 1.6  3.4  0.9  2.4  4.4  2.3
7  0.28 0.5  5.7  6.7  0.3  2.5  0.56

I would like to multiply each value in a given position in dataframe 1, with the value in the equivalent position in dataframe 2.

For example, I would like to multiple position [1,2] in dataset 1 (value = 0), by position [1,2] in dataset 2 (value = 0.65). My data set is very large and spans almost 300 columns and 500,000 IDs.

Variable names for SNP1- n are longer in reality (for example they actually read Affx.5869593), so I cannot just use SNP1-300 in my code, it would have to be specified by the number of columns.

Do I need to unlist both datasets by person ID and SNP name first? What function can be used for multiplying values within two datasets?

I am assuming that you are trying to return a third dataframe which will have, in each position, the product of the values that were in that position in the two data frames.

For example, if the following are your two dataframes

df1 <- structure(list(ID = c(1, 2, 3, 4, 5), SNP1a = c(0, 1, 1, 0, 1
), SNP2a = c(1, 1, 0, 1, 0)), class = "data.frame", row.names = c(NA, 
-5L))

ID  SNP1a  SNP2a
1     0     1
2     1     1
3     1     0
4     0     1
5     1     0

df2 <- structure(list(ID = c(1, 2, 3, 4, 5), SNP1b = c(0.65, 0.74, 
0.28, 0.74, 0.65), SNP2b = c(1.3, 1.6, 0.5, 1.6, 1.3)), class = . 
"data.frame", row.names = c(NA, -5L))

ID SNP1b SNP2b
1  0.65   1.3
2  0.74   1.6
3  0.28   0.5
4  0.74   1.6
5  0.65   1.3

Then

df3 <- df1[,2:3] * df2[,2:3]

   SNP1   SNP2
1  0.00   1.3
2  0.74   1.6
3  0.28   0.0
4  0.00   1.6
5  0.65   0.0

Will work (As long as the two dataframes are of equivalent size).

If your data frames have identical set of id's and they are the same size, you could sort both for id and do this:

df <- data.frame(
  id = c(1,2,3,4,5),
  snp1 = c(0,0,1,0,0),
  snp2 = c(1,1,1,0,1)
)

df2 <- data.frame(
  id <- c(1,2,3,4,5),
  snp1 <- c(0.3,0.2,0.3,0.1,0.2),
  snp2 <- c(0.5,0.8,0.2,0.3,0.3)

)


res <- mapply(`*`, df[,-1], df2[,-1)
res$id <- df$id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM