[英]R - replace values in dataframe based on two matching conditions
I'm working with lists of spatial data for 20+ different sites (difficult to reproduce here; sorry in advance). 我正在处理20多个不同站点的空间数据列表(此处难以复制;在此先抱歉)。 I have three data frames associated with each site; 每个站点都有三个数据框; each has a 'sample_ID' column and some other shared columns names. 每个都有一个“ sample_ID”列和一些其他共享列的名称。
What I'm trying to do seems very simple: if the 'sample_ID' values match for two data frames and the column names match, replace the value in DF 1 with that of DF 2 and DF 3 three. 我想做的事情看起来很简单:如果两个数据帧的“ sample_ID”值匹配并且列名匹配,则用DF 2和DF 3替换DF 1中的值。 Example: 例:
# DF 1:
SAMPLE_ID CLASS_ID CLASS VALUE
1 0 0 5
2 0 0 5
3 0 0 3
4 0 0 6
5 0 0 6
6 0 0 3
# DF 2
SAMPLE_ID REF_VAL CLASS_ID CLASS
1 33 2 cloud
2 45 3 water
3 NA 3 water
4 NA 4 forest
# DF 3
SAMPLE_ID CLASS_ID CLASS STRATA
5 3 NA 20
6 3 water 19
Desired output: 所需的输出:
# DF 1:
SAMPLE_ID CLASS_ID CLASS VALUE
1 2 cloud 5
2 3 water 5
3 3 water 3
4 4 forest 6
5 3 NA 6
6 3 water 3
All I can think to do is some sort of match
indexing, like: 我所能想到的就是某种match
索引,例如:
List1$CLASS_ID <- List2$CLASS_ID[match(List1$SAMPLE_ID, List2$SAMPLE_ID)
List1$CLASS_ID <- List3$CLASS_ID[match(List1$SAMPLE_ID, List3$SAMPLE_ID)
But this doesn't work; 但这是行不通的。 for one, it produces NAs in the nomatch
values (attempted a nested match
within the nomatch =
but that didn't work either), but more importantly I really need to streamline this by referencing all the matching column names rather than going one at a time since the actual data has 10+ columns that need replacement. 首先,它会在nomatch
值中生成NA(尝试在nomatch =
进行嵌套match
,但这也不起作用),但更重要的是,我真的需要通过引用所有匹配的列名来简化此过程,而不是在一个时间,因为实际数据有10列以上需要替换。 Also important, I need the blank NA values to transfer over as well. 同样重要的是,我还需要空白NA值来进行传递。
Any thoughts? 有什么想法吗?
With base R you can do: 使用基数R,您可以执行以下操作:
vars <- c("SAMPLE_ID", "CLASS_ID", "CLASS")
dt23 <- rbind(dt2[, vars], dt3[, vars])
m <- merge(dt1[, c("SAMPLE_ID","VALUE")], dt23, by="SAMPLE_ID", all.x=TRUE)
I would bind DT2
and DT3
then execute a join: 我将绑定DT2
和DT3
然后执行DT3
:
library(dplyr)
dt1 <- read.table(text = "
SAMPLE_ID CLASS_ID CLASS VALUE
1 0 0 5
2 0 0 5
3 0 0 3
4 0 0 6
5 0 0 6
6 0 0 3
", header = TRUE, stringsAsFactors = FALSE)
dt2 <- read.table(text = "
SAMPLE_ID REF_VAL CLASS_ID CLASS
1 33 2 cloud
2 45 3 water
3 NA 3 water
4 NA 4 forest
", header = TRUE, stringsAsFactors = FALSE)
dt3 <- read.table(text = "
SAMPLE_ID CLASS_ID CLASS STRATA
5 3 NA 20
6 3 water 19
", header = TRUE, stringsAsFactors = FALSE)
dt <- dt1[,c("SAMPLE_ID", "VALUE")]
dt <- left_join(dt, dplyr::bind_rows(dt2, dt3))
dt <- select(dt, SAMPLE_ID, CLASS_ID, CLASS, VALUE)
SAMPLE_ID CLASS_ID CLASS VALUE
1 1 2 cloud 5
2 2 3 water 5
3 3 3 water 3
4 4 4 forest 6
5 5 3 <NA> 6
6 6 3 water 3
You have a couple of options, depending on the rest of your application. 根据应用程序的其余部分,您有两种选择。
You could select ahead of time the columns you'll be replacing, remove them from the original dataset, and dplyr::left_join
the new data on: 您可以提前选择要替换的列,将其从原始数据集中删除,然后在以下位置使用dplyr::left_join
新数据:
df1 %>% select(-CLASS_ID, -CLASS) %>%
left_join(df2, by = "SAMPLE_ID") %>%
left_join(df3, by = "SAMPLE_ID")
But if you want to keep values from the original CLASS
and CLASS_ID
, you can use left_join
without removing them, and then use dplyr::coalesce
to update the new columns based on the old columns. 但是,如果要保留原始CLASS
和CLASS_ID
值,则可以使用left_join
而不删除它们,然后使用dplyr::coalesce
根据旧列更新新列。 You might have to use mutate_at
or mutate_if
, which you can see descriptions here: http://dplyr.tidyverse.org/reference/summarise_all.html . 您可能必须使用mutate_at
或mutate_if
,您可以在此处查看说明: http : mutate_if
。
The bit you were missing is that you want to select the matched IDs on both sides of the assignment. 您缺少的一点是,您想在分配的两侧选择匹配的ID。 Also %in%
usually works well in these cases: 同样%in%
在这些情况下, %in%
通常效果很好:
df1[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")] <- df2[df1$SAMPLE_ID %in% df2$SAMPLE_ID, c("CLASS_ID", "CLASS")]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.