简体   繁体   中英

Can someone explain the 'unexpected '='' message in my semi_join function in R when I use relative references?

I'm trying to build a script in R that will join on different fields based on user input. I'm running version 0.7.6 of dplyr through tidyverse (1.2.1).

I could build multiple mostly identical join statements and reference different ones based on the input, but that seems inelegant. Below is an example with commentary underneath that. I'm still kind of new to R, so I apologize if this itself is inelegant:

library(tidyverse)
df <- tibble(
  a = letters[1:20],
  b = c(1:5,1:5,1:5,1:5)
)

ref <- tibble(
  let_ref_col = c('e','g','b','d','f'),
  num_ref_col = c(2,4,NA,NA,NA)
)

df2 <- semi_join(df,ref,c('b'='num_ref_col'))

df3 <- semi_join(df,ref,c('b'=colnames(ref)[2]))
df2==df3 #just to check

df4 <- semi_join(df,ref,c(colnames(df)[2]=colnames(ref)[2]))

df2 will return 8 rows where column b in df is 2 or 4.

R doesn't seem to mind me generalizing the second join variable name, as evidenced by `df3.

When I try to apply the exact same logic to the first variable, I get an error message from df4 :

Error: unexpected '=' in "df4 <- inner_join(df,ref,c(colnames(df)[2]="

I'd love to be able to have a relative reference for both fields if possible. Something like:

JOIN_DESIRED <- 2
df5 <- semi_join(df,ref,c(colnames(df)[JOIN_DESIRED] = colnames(ref)[JOIN_DESIRED])

Which can be changed to 1 to join by letters instead of numbers.

Here is a workaround. We can use names<- to assign the names.

df4 <- semi_join(df, ref, `names<-`(colnames(ref)[2], colnames(df)[2]))

identical(df2, df4)
# [1] TRUE

identical(df3, df4)
# [1] TRUE

You're doing a lot of things on one line with your last line semi_join(df,ref,c(colnames(df)[2]=colnames(ref)[2])) . Specifically in this bit: colnames(df)[2]=colnames(ref)[2] there are a lot of operations that could run afoul of R's lazy execution logic . Here's how I might program it:

library(tidyverse)

df <- tibble(
  a = letters[1:20],
  b = c(1:5,1:5,1:5,1:5)
)

ref <- tibble(
  let_ref_col = c('e','g','b','d','f'),
  num_ref_col = c(2,4,NA,NA,NA)
)

semi_join_by_column_index <- function(df1, df2, idx) {
  original_name <- names(df1)[idx]

  names(df1)[idx] <- "join_column"
  names(df2)[idx] <- "join_column"

  new_df <- semi_join(df1, df2, by = "join_column")

  new_idx <- match("join_column", names(new_df))
  names(new_df)[new_idx] <- original_name

  return(new_df)
}

merged_df <- semi_join_by_column_index(df, ref, idx = 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM