简体   繁体   English

R-根据两个匹配条件替换数据框中的值

[英]R - replace values in dataframe based on two matching conditions

I'm working with lists of spatial data for 20+ different sites (difficult to reproduce here; sorry in advance). 我正在处理20多个不同站点的空间数据列表(此处难以复制;在此先抱歉)。 I have three data frames associated with each site; 每个站点都有三个数据框; each has a 'sample_ID' column and some other shared columns names. 每个都有一个“ sample_ID”列和一些其他共享列的名称。

What I'm trying to do seems very simple: if the 'sample_ID' values match for two data frames and the column names match, replace the value in DF 1 with that of DF 2 and DF 3 three. 我想做的事情看起来很简单:如果两个数据帧的“ sample_ID”值匹配并且列名匹配,则用DF 2和DF 3替换DF 1中的值。 Example: 例:

# DF 1:
SAMPLE_ID  CLASS_ID  CLASS  VALUE
    1         0        0      5
    2         0        0      5
    3         0        0      3
    4         0        0      6
    5         0        0      6
    6         0        0      3

# DF 2
SAMPLE_ID  REF_VAL  CLASS_ID  CLASS
    1        33        2      cloud
    2        45        3      water
    3        NA        3      water
    4        NA        4      forest

# DF 3
SAMPLE_ID  CLASS_ID  CLASS  STRATA
    5         3       NA      20
    6         3      water    19

Desired output: 所需的输出:

# DF 1:
SAMPLE_ID  CLASS_ID  CLASS  VALUE
    1         2      cloud    5
    2         3      water    5
    3         3      water    3
    4         4      forest   6
    5         3       NA      6
    6         3      water    3

All I can think to do is some sort of match indexing, like: 我所能想到的就是某种match索引,例如:

List1$CLASS_ID <- List2$CLASS_ID[match(List1$SAMPLE_ID, List2$SAMPLE_ID)
List1$CLASS_ID <- List3$CLASS_ID[match(List1$SAMPLE_ID, List3$SAMPLE_ID)

But this doesn't work; 但这是行不通的。 for one, it produces NAs in the nomatch values (attempted a nested match within the nomatch = but that didn't work either), but more importantly I really need to streamline this by referencing all the matching column names rather than going one at a time since the actual data has 10+ columns that need replacement. 首先,它会在nomatch值中生成NA(尝试在nomatch =进行嵌套match ,但这也不起作用),但更重要的是,我真的需要通过引用所有匹配的列名来简化此过程,而不是在一个时间,因为实际数据有10列以上需要替换。 Also important, I need the blank NA values to transfer over as well. 同样重要的是,我还需要空白NA值来进行传递。

Any thoughts? 有什么想法吗?

With base R you can do: 使用基数R,您可以执行以下操作:

vars <- c("SAMPLE_ID", "CLASS_ID", "CLASS")
dt23 <- rbind(dt2[, vars], dt3[, vars])
m <- merge(dt1[, c("SAMPLE_ID","VALUE")], dt23, by="SAMPLE_ID", all.x=TRUE)

I would bind DT2 and DT3 then execute a join: 我将绑定DT2DT3然后执行DT3

library(dplyr)

dt1 <- read.table(text = "
SAMPLE_ID  CLASS_ID  CLASS  VALUE
1         0        0      5
2         0        0      5
3         0        0      3
4         0        0      6
5         0        0      6
6         0        0      3
", header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "
SAMPLE_ID  REF_VAL  CLASS_ID  CLASS
1        33        2      cloud
2        45        3      water
3        NA        3      water
4        NA        4      forest
", header = TRUE, stringsAsFactors = FALSE)

dt3 <- read.table(text = "
SAMPLE_ID  CLASS_ID  CLASS  STRATA
5         3       NA      20
6         3      water    19
", header = TRUE, stringsAsFactors = FALSE)

dt <- dt1[,c("SAMPLE_ID", "VALUE")]
dt <- left_join(dt, dplyr::bind_rows(dt2, dt3))
dt <- select(dt, SAMPLE_ID, CLASS_ID, CLASS, VALUE)

  SAMPLE_ID CLASS_ID  CLASS VALUE
1         1        2  cloud     5
2         2        3  water     5
3         3        3  water     3
4         4        4 forest     6
5         5        3   <NA>     6
6         6        3  water     3

You have a couple of options, depending on the rest of your application. 根据应用程序的其余部分,您有两种选择。

Join 加入

You could select ahead of time the columns you'll be replacing, remove them from the original dataset, and dplyr::left_join the new data on: 您可以提前选择要替换的列,将其从原始数据集中删除,然后在以下位置使用dplyr::left_join新数据:

df1 %>% select(-CLASS_ID, -CLASS) %>% 
        left_join(df2, by = "SAMPLE_ID") %>%
        left_join(df3, by = "SAMPLE_ID")

But if you want to keep values from the original CLASS and CLASS_ID , you can use left_join without removing them, and then use dplyr::coalesce to update the new columns based on the old columns. 但是,如果要保留原始CLASSCLASS_ID值,则可以使用left_join而不删除它们,然后使用dplyr::coalesce根据旧列更新新列。 You might have to use mutate_at or mutate_if , which you can see descriptions here: http://dplyr.tidyverse.org/reference/summarise_all.html . 您可能必须使用mutate_atmutate_if ,您可以在此处查看说明: http : mutate_if

Or, your original idea: 或者,您最初的想法是:

The bit you were missing is that you want to select the matched IDs on both sides of the assignment. 您缺少的一点是,您想在分配的两侧选择匹配的ID。 Also %in% usually works well in these cases: 同样%in%在这些情况下, %in%通常效果很好:

df1[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")] <- df2[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 R 中的两个匹配条件将一个数据帧中的值添加到另一个数据帧 - Adding values from one dataframe to another based on two matching conditions in R R中基于多个条件匹配两个数据框的值 - Matching values of two data frames based on multiple conditions in R 根据R中的匹配条件组合行中的值 - combining values in rows based on matching conditions in R 根据另一个数据帧中的匹配条件将列添加到 R 中的数据帧 - Adding column to a dataframe in R based on matching conditions in another dataframe 基于 R 数据帧中的两个条件进行变异 - Mutate based on two conditions in R dataframe 根据 R 中两个数据帧的两列中的匹配对名称,将值添加到另一个数据帧中的其他值 - Add values to other values in another dataframe based on matching pair names in two columns of two dataframes in R 根据条件用另一个数据框替换数据框列 - R - Replace Dataframe column with another dataframe based on conditions - R 根据 r dataframe 中的匹配值删除行 - removing rows based on matching values in r dataframe R在另一列中匹配条件时替换一列中的值 - R replace values in one column upon matching conditions in another column 与 R 中另一个 dataframe 中的列匹配时,替换 dataframe 中的列中的值 - Replace values in column of a dataframe when matching to column in another dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM