简体   繁体   English

将数据集中的值与R中另一个数据集中的值相乘

[英]Multiply values in a dataset by values in another dataset in R

I have two datasets which both share a common ID variable, and also share n variables which are denoted SNP1-SNP n . 我有两个数据集,它们既共享一个公共ID变量,又共享n个变量,分别表示为SNP1-SNP n An example of the two datasets is shown below 下面显示了两个数据集的示例

Dataset 1 数据集1

ID SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7
1   0    1    1    0    0    0    0
2   1    1    0    0    0    0    0
3   1    0    0    0    1    1    0
4   0    1    1    0    0    0    0
5   1    0    0    0    1    1    0
6   1    0    0    0    1    1    0
7   0    1    1    0    0    0    0

Dataset 2 数据集2

ID SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7
1  0.65 1.3  2.8  0.43 0.62 0.9  1.5
2  0.74 1.6  3.4  0.9  2.4  4.4  2.3
3  0.28 0.5  5.7  6.7  0.3  2.5  0.56
4  0.74 1.6  3.4  0.9  2.4  4.4  2.3
5  0.65 1.3  2.8  0.43 0.62 0.9  1.5
6  0.74 1.6  3.4  0.9  2.4  4.4  2.3
7  0.28 0.5  5.7  6.7  0.3  2.5  0.56

I would like to multiply each value in a given position in dataframe 1, with the value in the equivalent position in dataframe 2. 我想将数据帧1中给定位置的每个值与数据帧2中等效位置的值相乘。

For example, I would like to multiple position [1,2] in dataset 1 (value = 0), by position [1,2] in dataset 2 (value = 0.65). 例如,我想将数据集1(值= 0)中的位置[1,2]乘以数据集2(值= 0.65)中的位置[1,2]。 My data set is very large and spans almost 300 columns and 500,000 IDs. 我的数据集非常大,涵盖了将近300列和500,000个ID。

Variable names for SNP1- n are longer in reality (for example they actually read Affx.5869593), so I cannot just use SNP1-300 in my code, it would have to be specified by the number of columns. 实际上,SNP1- n的变量名更长(例如,它们实际上读为Affx.5869593),因此我不能仅在代码中使用SNP1-300,而必须通过列数来指定。

Do I need to unlist both datasets by person ID and SNP name first? 我是否需要先按人员ID和SNP名称取消列出这两个数据集? What function can be used for multiplying values within two datasets? 可以使用什么函数将两个数据集中的值相乘?

I am assuming that you are trying to return a third dataframe which will have, in each position, the product of the values that were in that position in the two data frames. 我假设您正在尝试返回第三个数据帧,该数据帧将在每​​个位置具有两个数据帧中该位置的值的乘积。

For example, if the following are your two dataframes 例如,如果以下是您的两个数据框

df1 <- structure(list(ID = c(1, 2, 3, 4, 5), SNP1a = c(0, 1, 1, 0, 1
), SNP2a = c(1, 1, 0, 1, 0)), class = "data.frame", row.names = c(NA, 
-5L))

ID  SNP1a  SNP2a
1     0     1
2     1     1
3     1     0
4     0     1
5     1     0

df2 <- structure(list(ID = c(1, 2, 3, 4, 5), SNP1b = c(0.65, 0.74, 
0.28, 0.74, 0.65), SNP2b = c(1.3, 1.6, 0.5, 1.6, 1.3)), class = . 
"data.frame", row.names = c(NA, -5L))

ID SNP1b SNP2b
1  0.65   1.3
2  0.74   1.6
3  0.28   0.5
4  0.74   1.6
5  0.65   1.3

Then 然后

df3 <- df1[,2:3] * df2[,2:3]

   SNP1   SNP2
1  0.00   1.3
2  0.74   1.6
3  0.28   0.0
4  0.00   1.6
5  0.65   0.0

Will work (As long as the two dataframes are of equivalent size). 将起作用(只要两个数据帧的大小相等)。

If your data frames have identical set of id's and they are the same size, you could sort both for id and do this: 如果您的数据框具有相同的ID集并且大小相同,则可以同时对ID进行排序并执行以下操作:

df <- data.frame(
  id = c(1,2,3,4,5),
  snp1 = c(0,0,1,0,0),
  snp2 = c(1,1,1,0,1)
)

df2 <- data.frame(
  id <- c(1,2,3,4,5),
  snp1 <- c(0.3,0.2,0.3,0.1,0.2),
  snp2 <- c(0.5,0.8,0.2,0.3,0.3)

)


res <- mapply(`*`, df[,-1], df2[,-1)
res$id <- df$id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM