简体   繁体   中英

Using lapply to create new columns based on old columns

My data looks as follows:

DF <- structure(list(No_Adjusted_Gross_Income = c(183454, 241199, 249506
), NoR_from_1_to_5000 = c(1035373, 4272260, 1124098), NoR_from_5000_to_10000 = c(319540, 
4826042, 1959866)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))
val <- c(2500.5, 7500)
vn <- c("AGI_from_1_to_5000", "AGI_from_5000_to_10000")

   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000
1:                   183454            1035373                 319540
2:                   241199            4272260                4826042
3:                   249506            1124098                1959866

I would like to create new columns, based on column 2 and 3, multiplied with the values from val , using the names from vn . I tried to do it as follows:

DF[,2:3] <- lapply(DF[,2:3], vn := val*DF[,2:3])

But this does not work..

Desired output:

DF <- setDT(DF)[, vn[1]:=val[1]*DF[,2]]
DF <- setDT(DF)[, vn[2]:=val[2]*DF[,3]]

DFout <- structure(list(No_Adjusted_Gross_Income = c(183454, 241199, 249506
), NoR_from_1_to_5000 = c(1035373, 4272260, 1124098), NoR_from_5000_to_10000 = c(319540, 
4826042, 1959866), AGI_from_1_to_5000 = c(2588950186.5, 10682786130, 
2810807049), AGI_from_5000_to_10000 = c(2396550000, 36195315000, 
14698995000)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

   No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
1:                   183454            1035373                 319540         2588950187             2396550000
2:                   241199            4272260                4826042        10682786130            36195315000
3:                   249506            1124098                1959866         2810807049            14698995000

This should work.. lapply() is not needed

library( data.table )
setDT( DF )
DF[, (var) := as.data.table ( t( t( DF[, 2:3] ) * val ) ) ][]


#    No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000
# 1:                   183454            1035373                 319540         2588950187             2396550000
# 2:                   241199            4272260                4826042        10682786130            36195315000
# 3:                   249506            1124098                1959866         2810807049            14698995000

you can use apply to get your values, then use cbind if you want to combine with your original DF

t(apply(DF[,2:3],1, function(x) x*val ))

 NoR_from_1_to_5000 NoR_from_5000_to_10000
[1,]         2588950187             2396550000
[2,]        10682786130            36195315000
[3,]         2810807049            14698995000

The OP has asked in a comment for a grouping variable.

Although the accepted answer apparently does what the OP initially has asked for I would like to suggest a completey different approach where the data is stored and processed in tidy (long) format . IMHO, processing data in long format is much more straightforward and flexible (which includes aggregation & grouping).

For this, the dataset is reshaped from wide, Excel-style format to long, SQL-style format by

library(data.table)
col <- "NoR"
long <- melt(DF, measure.vars = patterns(col), value.name = col, variable.name = "range")
long[, range := stringr::str_remove(range, paste0(col, "_"))]
long
 No_Adjusted_Gross_Income range NoR 1: 183454 from_1_to_5000 1035373 2: 241199 from_1_to_5000 4272260 3: 249506 from_1_to_5000 1124098 4: 183454 from_5000_to_10000 319540 5: 241199 from_5000_to_10000 4826042 6: 249506 from_5000_to_10000 1959866

In tidy (long) format there is one row for each observation and one column for each variable (see Chapter 12.2 of Hadley Wickham's book R for Data Science .

The vector of multipliers val also needs to be reshaped from wide to long format:

valDF <- long[, .(range = unique(range), val)]
valDF
 range val 1: from_1_to_5000 2500.5 2: from_5000_to_10000 7500.0

Now, valDF is also in tidy format as there is one row for each range .

Finally, we can add a new column AGI to DF by an update join :

long[valDF, on = "range", AGI := val * NoR][]
 No_Adjusted_Gross_Income range NoR AGI 1: 183454 from_1_to_5000 1035373 2588950187 2: 241199 from_1_to_5000 4272260 10682786130 3: 249506 from_1_to_5000 1124098 2810807049 4: 183454 from_5000_to_10000 319540 2396550000 5: 241199 from_5000_to_10000 4826042 36195315000 6: 249506 from_5000_to_10000 1959866 14698995000

If required for presentation, the dataset can be reshaped back from long to wide format:

dcast(long, No_Adjusted_Gross_Income ~ range, value.var = c("NoR", "AGI"))
 No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000 1: 183454 1035373 319540 2588950187 2396550000 2: 241199 4272260 4826042 10682786130 36195315000 3: 249506 1124098 1959866 2810807049 14698995000

which reproduces OP's expected result. Note that the variable names vn are created automagically.


Aggregation and grouping can be performed while reshaping

dcast(long, No_Adjusted_Gross_Income ~ range, sum, value.var = c("NoR", "AGI"))
 No_Adjusted_Gross_Income NoR_from_1_to_5000 NoR_from_5000_to_10000 AGI_from_1_to_5000 AGI_from_5000_to_10000 1: 183454 1035373 319540 2588950187 2396550000 2: 241199 4272260 4826042 10682786130 36195315000 3: 249506 1124098 1959866 2810807049 14698995000

or

dcast(long, No_Adjusted_Gross_Income ~ ., sum, value.var = c("NoR", "AGI"))
 No_Adjusted_Gross_Income NoR AGI 1: 183454 1354913 4985500187 2: 241199 9098302 46878101130 3: 249506 3083964 17509802049

Alternatively, aggregation & grouping can be performed in long format:

long[, lapply(.SD, sum), .SDcols = c("NoR", "AGI"), by = No_Adjusted_Gross_Income]
 No_Adjusted_Gross_Income NoR AGI 1: 183454 1354913 4985500187 2: 241199 9098302 46878101130 3: 249506 3083964 17509802049

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM