R-根据两个匹配条件替换数据框中的值

Question

I'm working with lists of spatial data for 20+ different sites (difficult to reproduce here; sorry in advance). 我正在处理20多个不同站点的空间数据列表（此处难以复制；在此先抱歉）。 I have three data frames associated with each site; 每个站点都有三个数据框； each has a 'sample_ID' column and some other shared columns names. 每个都有一个“ sample_ID”列和一些其他共享列的名称。

What I'm trying to do seems very simple: if the 'sample_ID' values match for two data frames and the column names match, replace the value in DF 1 with that of DF 2 and DF 3 three. 我想做的事情看起来很简单：如果两个数据帧的“ sample_ID”值匹配并且列名匹配，则用DF 2和DF 3替换DF 1中的值。 Example: 例：

# DF 1:
SAMPLE_ID  CLASS_ID  CLASS  VALUE
    1         0        0      5
    2         0        0      5
    3         0        0      3
    4         0        0      6
    5         0        0      6
    6         0        0      3

# DF 2
SAMPLE_ID  REF_VAL  CLASS_ID  CLASS
    1        33        2      cloud
    2        45        3      water
    3        NA        3      water
    4        NA        4      forest

# DF 3
SAMPLE_ID  CLASS_ID  CLASS  STRATA
    5         3       NA      20
    6         3      water    19

Desired output: 所需的输出：

# DF 1:
SAMPLE_ID  CLASS_ID  CLASS  VALUE
    1         2      cloud    5
    2         3      water    5
    3         3      water    3
    4         4      forest   6
    5         3       NA      6
    6         3      water    3

All I can think to do is some sort of match indexing, like: 我所能想到的就是某种match索引，例如：

List1$CLASS_ID <- List2$CLASS_ID[match(List1$SAMPLE_ID, List2$SAMPLE_ID)
List1$CLASS_ID <- List3$CLASS_ID[match(List1$SAMPLE_ID, List3$SAMPLE_ID)

But this doesn't work; 但这是行不通的。 for one, it produces NAs in the nomatch values (attempted a nested match within the nomatch = but that didn't work either), but more importantly I really need to streamline this by referencing all the matching column names rather than going one at a time since the actual data has 10+ columns that need replacement. 首先，它会在nomatch值中生成NA（尝试在nomatch =进行嵌套match ，但这也不起作用），但更重要的是，我真的需要通过引用所有匹配的列名来简化此过程，而不是在一个时间，因为实际数据有10列以上需要替换。 Also important, I need the blank NA values to transfer over as well. 同样重要的是，我还需要空白NA值来进行传递。

Any thoughts? 有什么想法吗？

Answer 1

With base R you can do: 使用基数R，您可以执行以下操作：

vars <- c("SAMPLE_ID", "CLASS_ID", "CLASS")
dt23 <- rbind(dt2[, vars], dt3[, vars])
m <- merge(dt1[, c("SAMPLE_ID","VALUE")], dt23, by="SAMPLE_ID", all.x=TRUE)

Answer 2

I would bind DT2 and DT3 then execute a join: 我将绑定DT2和DT3然后执行DT3 ：

library(dplyr)

dt1 <- read.table(text = "
SAMPLE_ID  CLASS_ID  CLASS  VALUE
1         0        0      5
2         0        0      5
3         0        0      3
4         0        0      6
5         0        0      6
6         0        0      3
", header = TRUE, stringsAsFactors = FALSE)

dt2 <- read.table(text = "
SAMPLE_ID  REF_VAL  CLASS_ID  CLASS
1        33        2      cloud
2        45        3      water
3        NA        3      water
4        NA        4      forest
", header = TRUE, stringsAsFactors = FALSE)

dt3 <- read.table(text = "
SAMPLE_ID  CLASS_ID  CLASS  STRATA
5         3       NA      20
6         3      water    19
", header = TRUE, stringsAsFactors = FALSE)

dt <- dt1[,c("SAMPLE_ID", "VALUE")]
dt <- left_join(dt, dplyr::bind_rows(dt2, dt3))
dt <- select(dt, SAMPLE_ID, CLASS_ID, CLASS, VALUE)

  SAMPLE_ID CLASS_ID  CLASS VALUE
1         1        2  cloud     5
2         2        3  water     5
3         3        3  water     3
4         4        4 forest     6
5         5        3   <NA>     6
6         6        3  water     3

Answer 3

You have a couple of options, depending on the rest of your application. 根据应用程序的其余部分，您有两种选择。

Join 加入

You could select ahead of time the columns you'll be replacing, remove them from the original dataset, and dplyr::left_join the new data on: 您可以提前选择要替换的列，将其从原始数据集中删除，然后在以下位置使用dplyr::left_join新数据：

df1 %>% select(-CLASS_ID, -CLASS) %>% 
        left_join(df2, by = "SAMPLE_ID") %>%
        left_join(df3, by = "SAMPLE_ID")

But if you want to keep values from the original CLASS and CLASS_ID , you can use left_join without removing them, and then use dplyr::coalesce to update the new columns based on the old columns. 但是，如果要保留原始CLASS和CLASS_ID值，则可以使用left_join而不删除它们，然后使用dplyr::coalesce根据旧列更新新列。 You might have to use mutate_at or mutate_if , which you can see descriptions here: http://dplyr.tidyverse.org/reference/summarise_all.html . 您可能必须使用mutate_at或mutate_if ，您可以在此处查看说明： http : mutate_if 。

Or, your original idea: 或者，您最初的想法是：

The bit you were missing is that you want to select the matched IDs on both sides of the assignment. 您缺少的一点是，您想在分配的两侧选择匹配的ID。 Also %in% usually works well in these cases: 同样%in%在这些情况下， %in%通常效果很好：

df1[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")] <- df2[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")]

R-根据两个匹配条件替换数据框中的值

问题描述

3 个解决方案

解决方案1
1 2018-01-28 02:23:15

解决方案2
0 2018-01-26 23:04:23

解决方案3
0 2018-01-26 23:07:32

Join 加入

Or, your original idea: 或者，您最初的想法是：

R-根据两个匹配条件替换数据框中的值

问题描述

3 个解决方案

解决方案1 1 2018-01-28 02:23:15

解决方案2 0 2018-01-26 23:04:23

解决方案3 0 2018-01-26 23:07:32

Join 加入

Or, your original idea: 或者，您最初的想法是：

解决方案1
1 2018-01-28 02:23:15

解决方案2
0 2018-01-26 23:04:23

解决方案3
0 2018-01-26 23:07:32