简体   繁体   English

将一个数据框中的值替换为另一个

[英]Replace value in one data frame from another

I've got two data.frames : 我有两个data.frames

First: 第一:

   > dput(head(tbl_mz))
    structure(list(m.z = c(258.1686969, 258.168752, 587.8313625, 
    587.8425292, 523.2863282, 523.2859396), Measured.mass = c(514.3228408, 
    514.3229511, 1173.648172, 802.4706732, 1272.645144, 1044.557326
    )), .Names = c("m.z", "Measured.mass"), row.names = c(NA, 6L), class = "data.frame")

Second: 第二:

> dput(head(tbl_exl))
structure(list(V1 = c(802.4706732, 1272.649209, 1272.646875, 
1272.646599, 1272.646521, 1272.645144), V2 = c(NA, NA, NA, NA, 
NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA, 
NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"), 
    V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7, 
    31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA, 
    NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA, 
    NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", 
"V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")

I would like to replace some values from tbl_exl , column V1 by values from the the other table tbl_mz . 我想用另一个表tbl_mz值替换tbl_exlV1的一些值。 The values from column V1 (tbl_exl) can be found in the column Measured.mass (tbl_mz) and they should be replaced by the values from the next column mz in tbl_mz data frame. V1列(tbl_exl)中的值可以在Measured.mass (tbl_mz)列中找到,并且应将它们替换为tbl_mz数据帧中下一列mz的值。

In another words the values in the V1 should be replaced by the mz values. 换句话说, V1的值应替换为mz值。

The problem is that not all values from V1 can't be find in the other data frame. 问题在于,并非在另一个数据帧中找不到来自V1所有值。 Those which can be find can be deleted or just left like they are. 可以删除的可以删除,也可以照原样保留。

The output, which I want to get: 我想要得到的输出:

 > dput(head(tbl_exl_modified))
    structure(list(V1 = c(587.8425292, 1272.649209, 1272.646875, 
    1272.646599, 1272.646521, 523.2863282), V2 = c(NA, NA, NA, NA, 
    NA, NA), V3 = c(NA, NA, NA, NA, NA, NA), V4 = c(NA, NA, NA, NA, 
    NA, NA), V5 = c(NA, NA, NA, NA, NA, NA), V6 = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("", "Positive"), class = "factor"), 
        V7 = c(28.7, 29.4, 29.4, 23.8, 28.6, 23.3), V8 = c(30.7, 
        31.4, 31.4, 25.8, 30.6, 25.3), X = c(NA, NA, NA, NA, NA, 
        NA), X.1 = c(NA, NA, NA, NA, NA, NA), X.2 = c(NA, NA, NA, 
        NA, NA, NA)), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", 
    "V7", "V8", "X", "X.1", "X.2"), row.names = c(NA, 6L), class = "data.frame")

You could try match . 您可以尝试match Create numeric indexes based on the match between the columns ("Measured.mass", "V1") of the two datasets. 根据两个数据集的列(“ Measured.mass”,“ V1”)之间的match来创建数字索引。 Remove the NA values ("indx1", "indxN1") and replace the "V1" values to "mz" based on these index. 删除NA值(“ indx1”,“ indxN1”),并根据这些索引将“ V1”值替换为“ mz”。

indx <- match(tbl_mz$Measured.mass, tbl_exl$V1)
indx1 <- indx[!is.na(indx)]
indxN <-  match(tbl_exl$V1, tbl_mz$Measured.mass)
indxN1 <- indxN[!is.na(indxN)]
tbl_exl$V1[indx1] <- tbl_mz$m.z[indxN1]

identical(tbl_exl, tbl_exl_modified)
#[1] TRUE

Or use left_join from dplyr 或者使用left_joindplyr

library(dplyr)
tbl_exl1 <- left_join(tbl_exl, tbl_mz, by=c('V1'='Measured.mass')) %>%
                mutate(V1= pmax((NA^!is.na(m.z))*V1, m.z,
                                                 na.rm=TRUE)) %>% 
                select(-m.z)

 tbl_exl1
 #        V1 V2 V3 V4 V5       V6   V7   V8  X X.1 X.2
 #1  587.8425 NA NA NA NA Positive 28.7 30.7 NA  NA  NA
 #2 1272.6492 NA NA NA NA Positive 29.4 31.4 NA  NA  NA
 #3 1272.6469 NA NA NA NA Positive 29.4 31.4 NA  NA  NA
 #4 1272.6466 NA NA NA NA Positive 23.8 25.8 NA  NA  NA
 #5 1272.6465 NA NA NA NA Positive 28.6 30.6 NA  NA  NA
 #6  523.2863 NA NA NA NA Positive 23.3 25.3 NA  NA  NA

Here's a solution using data.table s binary join 这是使用data.table的二进制连接的解决方案

library(data.table)
setnames(setDT(tbl_exl), 1, "Measured.mass") # Changing the first column name for the join to work
setkey(tbl_exl, Measured.mass) # Keying tbl_exl by `Measured.mass`
setkey(setDT(tbl_mz), Measured.mass) # Keying tbl_exl by `Measured.mass`
tbl_exl[tbl_mz, Measured.mass := i.m.z][] # Joining and retrieving only matched values from `i.m.z`
#    Measured.mass V2 V3 V4 V5       V6   V7   V8  X X.1 X.2
# 1:      587.8425 NA NA NA NA Positive 28.7 30.7 NA  NA  NA
# 2:      523.2863 NA NA NA NA Positive 23.3 25.3 NA  NA  NA
# 3:     1272.6465 NA NA NA NA Positive 28.6 30.6 NA  NA  NA
# 4:     1272.6466 NA NA NA NA Positive 23.8 25.8 NA  NA  NA
# 5:     1272.6469 NA NA NA NA Positive 29.4 31.4 NA  NA  NA
# 6:     1272.6492 NA NA NA NA Positive 29.4 31.4 NA  NA  NA

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R 中是否有方法将一个数据帧中的值替换为另一个数据帧中的相关值? - Is there method in R to replace values in one data frame with a related value from another data frame? 从另一个数据框中的值替换一个数据框中的值 - Replace values in one data frame from values in another data frame 将值从一个数据帧添加到R中的另一个数据帧 - Add value from one data frame into another data frame in R 如何通过匹配R中的一个列值来用另一个数据帧替换数据帧值? - How to replace the data frame value with another data frame by matching one column value in R? 用另一个数据框中的colmn值合并/替换行名 - Merge/replace row name with colmn value from another data frame 将值替换为另一个数据帧的值 - Replace a value by a value another data-frame 通过根据另一个数据框中列的值从一个数据框中提取列来创建新数据框 - creating a new data frame by extracting columns from one data frame based on the value of column in another data frame 有条件地将一个 data.frame 中的匹配值替换为另一个 data.frame 中的值 - Conditionally replace matching values from one data.frame to values in another data.frame 将一个数据框的索引值替换为另一个数据框的日期 - Replace index values of one data frame with dates of another data frame 如果一个数据框中的行出现在另一个数据框中,则替换它们 - Replace rows in one data frame if they appear in another data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM