Let's say I have 2 tables and their names are:
csvs <- c("jan", "feb")
I am looking to create a new column within each table that denotes their period by simply taking the df's name. My attempt is:
lapply(csvs, function(x) eval(as.name(x))[, period := x])
Yes, I would prefer an apply over a loop. However, I am receiving the error below:
Invalid.internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that:= can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.
I have looked up shallow copy but do not understand how it applies to my context. Any help would be appreciated.
T.Fung, If you're looking to add a column to each dataframe called, say period
, with the values in that column all being the name of the dataframe, you could do it this way:
jan$'period' <- 'jan'
feb$'period' <- 'feb'
To do this in a loop:
# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))
# vector of your table names
csvs <- c('jan', 'feb')
# loops to add period column to each
for(i in 1:length(csvs)){
tmp <- paste0(csvs[i],'$period <- \'', csvs[i], '\'',sep = "")
eval(parse(text = tmp))
}
jan
#> some_data more_data period
#> 1 1 1 jan
#> 2 2 2 jan
#> 3 3 3 jan
#> 4 4 4 jan
#> 5 5 5 jan
AND here's how to do it with an apply-function:
# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))
# vector of your table names
csvs <- c('jan', 'feb')
# This will put all the dataframes into a list
my_fun <- function(csvs){
tmp <- paste0(csvs,'$period <- \'', csvs, '\'',sep = "")
eval(parse(text = tmp))
df <- eval(parse(text=csvs))
return(df)
}
# apply the function and create a list of dataframes
dfs <- lapply(csvs, FUN = my_fun)
# name the dataframes in the list
names(dfs) <- csvs
# pull the dataframes out of the list and assign to the environment
lapply(names(dfs), function(x) assign(x, dfs[[x]], envir = .GlobalEnv))
#> [[1]]
#> some_data more_data period
#> 1 1 1 jan
#> 2 2 2 jan
#> 3 3 3 jan
#> 4 4 4 jan
#> 5 5 5 jan
#>
#> [[2]]
#> some_data more_data period
#> 1 1 1 feb
#> 2 2 2 feb
#> 3 3 3 feb
#> 4 4 4 feb
#> 5 5 5 feb
# check dataframes for period column
jan
#> some_data more_data period
#> 1 1 1 jan
#> 2 2 2 jan
#> 3 3 3 jan
#> 4 4 4 jan
#> 5 5 5 jan
feb
#> some_data more_data period
#> 1 1 1 feb
#> 2 2 2 feb
#> 3 3 3 feb
#> 4 4 4 feb
#> 5 5 5 feb
If I just replace eval(as.name(x))
with get(x)
(see example below), your lapply
solution works fine for me with data.table 1.13.6.
test1 <- data.table(a = 1:3, b = 4:6)
test2 <- data.table(a = 7:9, b = 10:12)
dtNames <- c("test1", "test2")
lapply(dtNames, function(x) get(x)[, dtName := x])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.