简体   繁体   中英

(R, Data.Tables): Subset rows based on logical values in columns with dynamically assigned column names

I have a data table with two columns named based on variables. I'm a touch new to the quirks of the data.tables package, but I've gotten something like the following code to work so far...

varNames <- c("Subtype", ...)

for (i in length(varNames)) {

  nm1 <- (paste0(varNames[i],"1"))
  nm2 <- (paste0(varNames[i],"2"))
  
  DT[,(nm1):= x1]
  DT[,(nm2):= x2]
  
  #A BUNCH OF OTHER CODE GOES HERE...

}

I want to single out the rows where columns named nm1 and columns named nm2 match, but I know I can't just do this...

nmMatch <- (paste0(varNames[i],"Match"))
DT[, (nmMatch) := F ]
DT[(nm1)==(nm2), (nmMatch) := T] #Returns empty data table :^(

I think this is either because there are no columns actually named "nm1" or "nm2" or because the variable named nm1 does not equal the variable named nm2.

If I didn't need to assign these based on a vector of character values, I would write this to get what I'm looking for...

DT[, "SubtypeMatch" := F]    
DT[(Subtype1) == (Subtype2), SubtypeMatch := T]

How do I get a subset of rows based on column values if I need to reference those column names through variables? Is there a way to do that for data tables? These end up being huge structures (> 1000000 rows), so any work arounds using sapply() end up being prohibitively slow.

I recognize that there may be ways that I could fundamentally restructure my code so that I never really need to do this, and I'm happy to hear those, but I'm also interested in any "Proper" way to accomplish this subsetting task with data.tables.

Use get :

library(data.table)
DT[, (nmMatch) := FALSE ]
DT[get(nm1)== get(nm2), (nmMatch) := TRUE]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM