简体   繁体   中英

R, data.table - create new column for multiple tables

Let's say I have 2 tables and their names are:

csvs <- c("jan", "feb")

I am looking to create a new column within each table that denotes their period by simply taking the df's name. My attempt is:

lapply(csvs, function(x)  eval(as.name(x))[, period := x])

Yes, I would prefer an apply over a loop. However, I am receiving the error below:

Invalid.internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that:= can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

I have looked up shallow copy but do not understand how it applies to my context. Any help would be appreciated.

T.Fung, If you're looking to add a column to each dataframe called, say period , with the values in that column all being the name of the dataframe, you could do it this way:

jan$'period' <- 'jan'
feb$'period' <- 'feb'

To do this in a loop:

# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))

# vector of your table names
csvs <- c('jan', 'feb')

# loops to add period column to each
for(i in 1:length(csvs)){
  tmp <- paste0(csvs[i],'$period <- \'', csvs[i], '\'',sep = "")
  eval(parse(text = tmp))
}

jan
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan

AND here's how to do it with an apply-function:

# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))

# vector of your table names
csvs <- c('jan', 'feb')

# This will put all the dataframes into a list
my_fun <- function(csvs){
  tmp <- paste0(csvs,'$period <- \'', csvs, '\'',sep = "")
  eval(parse(text = tmp))
  df <- eval(parse(text=csvs))
  return(df)
}

# apply the function and create a list of dataframes
dfs <- lapply(csvs, FUN = my_fun)

# name the dataframes in the list
names(dfs) <- csvs

# pull the dataframes out of the list and assign to the environment
lapply(names(dfs), function(x) assign(x, dfs[[x]], envir = .GlobalEnv))
#> [[1]]
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan
#> 
#> [[2]]
#>   some_data more_data period
#> 1         1         1    feb
#> 2         2         2    feb
#> 3         3         3    feb
#> 4         4         4    feb
#> 5         5         5    feb

# check dataframes for period column
jan
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan
feb
#>   some_data more_data period
#> 1         1         1    feb
#> 2         2         2    feb
#> 3         3         3    feb
#> 4         4         4    feb
#> 5         5         5    feb

If I just replace eval(as.name(x)) with get(x) (see example below), your lapply solution works fine for me with data.table 1.13.6.

test1 <- data.table(a = 1:3, b = 4:6)
test2 <- data.table(a = 7:9, b = 10:12)
dtNames <- c("test1", "test2")

lapply(dtNames, function(x) get(x)[, dtName := x])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM