[英]Replace value in one data frame from another
I've got two data.frames
: 我有两个data.frames
:
First: 第一:
> dput(head(tbl_mz))
structure(list(m.z = c(258.1686969, 258.168752, 587.8313625,
587.8425292, 523.2863282, 523.2859396), Measured.mass = c(514.3228408,
514.3229511, 1173.648172, 802.4706732, 1272.645144, 1044.557326
)), .Names = c("m.z", "Measured.mass"), row.names = c(NA, 6L), class = "data.frame")
Second: 第二:
> dput(head(tbl_exl))
structure(list(V1 = c(802.4706732, 1272.649209, 1272.646875,
1272.646599, 1272.646521, 1272.645144), V2 = c(NA, NA, NA, NA,
NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA,
NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"),
V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7,
31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA,
NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA,
NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
"V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")
I would like to replace some values from tbl_exl
, column V1
by values from the the other table tbl_mz
. 我想用另一个表tbl_mz
值替换tbl_exl
列V1
的一些值。 The values from column V1
(tbl_exl) can be found in the column Measured.mass
(tbl_mz) and they should be replaced by the values from the next column mz
in tbl_mz
data frame. V1
列(tbl_exl)中的值可以在Measured.mass
(tbl_mz)列中找到,并且应将它们替换为tbl_mz
数据帧中下一列mz
的值。
In another words the values in the V1
should be replaced by the mz
values. 换句话说, V1
的值应替换为mz
值。
The problem is that not all values from V1
can't be find in the other data frame. 问题在于,并非在另一个数据帧中找不到来自V1
所有值。 Those which can be find can be deleted or just left like they are. 可以删除的可以删除,也可以照原样保留。
The output, which I want to get: 我想要得到的输出:
> dput(head(tbl_exl_modified))
structure(list(V1 = c(587.8425292, 1272.649209, 1272.646875,
1272.646599, 1272.646521, 523.2863282), V2 = c(NA, NA, NA, NA,
NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA,
NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"),
V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7,
31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA,
NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA,
NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6",
"V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")
You could try match
. 您可以尝试match
。 Create numeric indexes based on the match
between the columns ("Measured.mass", "V1") of the two datasets. 根据两个数据集的列(“ Measured.mass”,“ V1”)之间的match
来创建数字索引。 Remove the NA
values ("indx1", "indxN1") and replace the "V1" values to "mz" based on these index. 删除NA
值(“ indx1”,“ indxN1”),并根据这些索引将“ V1”值替换为“ mz”。
indx <- match(tbl_mz$Measured.mass, tbl_exl$V1)
indx1 <- indx[!is.na(indx)]
indxN <- match(tbl_exl$V1, tbl_mz$Measured.mass)
indxN1 <- indxN[!is.na(indxN)]
tbl_exl$V1[indx1] <- tbl_mz$m.z[indxN1]
identical(tbl_exl, tbl_exl_modified)
#[1] TRUE
Or use left_join
from dplyr
或者使用left_join
的dplyr
library(dplyr)
tbl_exl1 <- left_join(tbl_exl, tbl_mz, by=c('V1'='Measured.mass')) %>%
mutate(V1= pmax((NA^!is.na(m.z))*V1, m.z,
na.rm=TRUE)) %>%
select(-m.z)
tbl_exl1
# V1 V2 V3 V4 V5 V6 V7 V8 X X.1 X.2
#1 587.8425 NA NA NA NA Positive 28.7 30.7 NA NA NA
#2 1272.6492 NA NA NA NA Positive 29.4 31.4 NA NA NA
#3 1272.6469 NA NA NA NA Positive 29.4 31.4 NA NA NA
#4 1272.6466 NA NA NA NA Positive 23.8 25.8 NA NA NA
#5 1272.6465 NA NA NA NA Positive 28.6 30.6 NA NA NA
#6 523.2863 NA NA NA NA Positive 23.3 25.3 NA NA NA
Here's a solution using data.table
s binary join 这是使用data.table
的二进制连接的解决方案
library(data.table)
setnames(setDT(tbl_exl), 1, "Measured.mass") # Changing the first column name for the join to work
setkey(tbl_exl, Measured.mass) # Keying tbl_exl by `Measured.mass`
setkey(setDT(tbl_mz), Measured.mass) # Keying tbl_exl by `Measured.mass`
tbl_exl[tbl_mz, Measured.mass := i.m.z][] # Joining and retrieving only matched values from `i.m.z`
# Measured.mass V2 V3 V4 V5 V6 V7 V8 X X.1 X.2
# 1: 587.8425 NA NA NA NA Positive 28.7 30.7 NA NA NA
# 2: 523.2863 NA NA NA NA Positive 23.3 25.3 NA NA NA
# 3: 1272.6465 NA NA NA NA Positive 28.6 30.6 NA NA NA
# 4: 1272.6466 NA NA NA NA Positive 23.8 25.8 NA NA NA
# 5: 1272.6469 NA NA NA NA Positive 29.4 31.4 NA NA NA
# 6: 1272.6492 NA NA NA NA Positive 29.4 31.4 NA NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.