I'm trying to build a script in R that will join on different fields based on user input. I'm running version 0.7.6 of dplyr through tidyverse (1.2.1).
I could build multiple mostly identical join statements and reference different ones based on the input, but that seems inelegant. Below is an example with commentary underneath that. I'm still kind of new to R, so I apologize if this itself is inelegant:
library(tidyverse)
df <- tibble(
a = letters[1:20],
b = c(1:5,1:5,1:5,1:5)
)
ref <- tibble(
let_ref_col = c('e','g','b','d','f'),
num_ref_col = c(2,4,NA,NA,NA)
)
df2 <- semi_join(df,ref,c('b'='num_ref_col'))
df3 <- semi_join(df,ref,c('b'=colnames(ref)[2]))
df2==df3 #just to check
df4 <- semi_join(df,ref,c(colnames(df)[2]=colnames(ref)[2]))
df2
will return 8 rows where column b in df is 2 or 4.
R doesn't seem to mind me generalizing the second join variable name, as evidenced by `df3.
When I try to apply the exact same logic to the first variable, I get an error message from df4
:
Error: unexpected '=' in "df4 <- inner_join(df,ref,c(colnames(df)[2]="
I'd love to be able to have a relative reference for both fields if possible. Something like:
JOIN_DESIRED <- 2
df5 <- semi_join(df,ref,c(colnames(df)[JOIN_DESIRED] = colnames(ref)[JOIN_DESIRED])
Which can be changed to 1 to join by letters instead of numbers.
Here is a workaround. We can use names<-
to assign the names.
df4 <- semi_join(df, ref, `names<-`(colnames(ref)[2], colnames(df)[2]))
identical(df2, df4)
# [1] TRUE
identical(df3, df4)
# [1] TRUE
You're doing a lot of things on one line with your last line semi_join(df,ref,c(colnames(df)[2]=colnames(ref)[2]))
. Specifically in this bit: colnames(df)[2]=colnames(ref)[2]
there are a lot of operations that could run afoul of R's lazy execution logic . Here's how I might program it:
library(tidyverse)
df <- tibble(
a = letters[1:20],
b = c(1:5,1:5,1:5,1:5)
)
ref <- tibble(
let_ref_col = c('e','g','b','d','f'),
num_ref_col = c(2,4,NA,NA,NA)
)
semi_join_by_column_index <- function(df1, df2, idx) {
original_name <- names(df1)[idx]
names(df1)[idx] <- "join_column"
names(df2)[idx] <- "join_column"
new_df <- semi_join(df1, df2, by = "join_column")
new_idx <- match("join_column", names(new_df))
names(new_df)[new_idx] <- original_name
return(new_df)
}
merged_df <- semi_join_by_column_index(df, ref, idx = 2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.