简体   繁体   中英

Rescale multiple variable at once

I would like to rescale multiple variables at once. Each one of the variable should be rescaled between 0 and 10. My dataset looks something like this

df<-structure(list(Year = 1985:2012, r_mean_dp_C_EU_PTA = c(0.166685371371432,0, 0.340384674048008, 0.255663634111618, 0.137833312888481, 0.215940736735375,0.695926742038269, 1.12488458324014, 1.50426967770413, 1.96800275204271, 
                                                             1.84220420613839, 2.55081439923073, 2.83958315572122, 3.02471358081631, 2.76227596053162, 5.13672466755955, 6.22501740311663, 6.04685020876299, 
                                                             5.48990293535953, 5.74245144436088, 6.87554176822673, 5.35866756802216,6.21821261660873, 7.39740372167956, 7.37052059919359, 8.4053331043966, 
                                                             7.88284279150424, 10), 
                    r_mean_dp_C_US_PTA = c(0, 0.0243131684738152,  0.0295348762350131, 1.24572619158458, 1.20624633452509, 1.57418568231032,1.45479246796848, 2.38700784566208, 2.62865525326503, 2.26401361870534,2.67319203680329, 2.64440548764366, 3.10459526464658, 3.05231530072328, 
                                                                                                           3.32660416229216, 4.14909239351474, 3.76404440984403, 3.79766644256544,4.55279786294561, 5.57506946922008, 6.83412605593388, 8.07241989452914,9.10370786838265, 9.51564633960853, 8.64357423479438, 9.10723202296861,10, 9.06442082870898), 
                    r_mean_dp_C_eu_esr_sum = c(0.0267071299038037,0, 0.0481033555876806, 0.039231355183461, 0.0255363040160583,0.0284158726695472, 0.234715155525714, 0.544954230234254, 0.683338138878583, 0.828929653572072, 0.950656658215744, 1.21492080702167, 1.30147631753441, 1.36122263965133, 1.33106989847101, 1.7848396827464, 2.19247065377408, 2.1506217173316, 4.91794342139369, 4.83398913690854, 7.28545175419305,5.42827409024432, 7.34375238832023, 8.91410171271897, 8.98533852868884,  9.17361943843028, 9.21421152468197, 10)), row.names = c(NA, -28L
                                                                                                                                                             ), 
class = c("data.table", "data.frame"))

I have tried to use the package scales but it does not work

While the function with name identifiers fails

library(scales)
vars<-names(df[,2:4])
tst<-setDT(df)[, (vars):=lapply((vars), function(x) rescale(x,to = c(0,10)))]

Using position identifiers sets all the variable values to 5 which is not what I am looking for.

tst<-setDT(df)[, 2:4:=lapply(2:4, function(x) rescale(x,to = c(0,10)))]
tst
#    Year r_mean_dp_C_EU_PTA r_mean_dp_C_US_PTA r_mean_dp_C_eu_esr_sum
# 1: 1985                  5                  5                      5
# 2: 1986                  5                  5                      5
# 3: 1987                  5                  5                      5
# 4: 1988                  5                  5                      5
# 5: 1989                  5                  5                      5
# 6: 1990                  5                  5                      5
# 7: 1991                  5                  5                      5
# 8: 1992                  5                  5                      5
# 9: 1993                  5                  5                      5
# 10: 1994                  5                  5                      5
# 11: 1995                  5                  5                      5
# 12: 1996                  5                  5                      5
# 13: 1997                  5                  5                      5
# 14: 1998                  5                  5                      5
# 15: 1999                  5                  5                      5
# 16: 2000                  5                  5                      5
# 17: 2001                  5                  5                      5
# 18: 2002                  5                  5                      5
# 19: 2003                  5                  5                      5
# 20: 2004                  5                  5                      5
# 21: 2005                  5                  5                      5
# 22: 2006                  5                  5                      5
# 23: 2007                  5                  5                      5
# 24: 2008                  5                  5                      5
# 25: 2009                  5                  5                      5
# 26: 2010                  5                  5                      5
# 27: 2011                  5                  5                      5
# 28: 2012                  5                  5                      5

Does anyone know what I am doing wrong?

Thanks a lot in advance for your help

I am a bit confused what the exact output needs to be, as in this example everything is between 0 and 10.

Did you try to use dplyr ?

tst <- df %>% 
  mutate_at(vars, function(x) rescale(x,to = c(0,10)) )

resulted in:

   Year r_mean_dp_C_EU_PTA r_mean_dp_C_US_PTA r_mean_dp_C_eu_esr_sum
1  1985          0.1515322         0.00000000             0.02670713
2  1986          0.0000000         0.02431317             0.00000000
3  1987          0.3094406         0.02953488             0.04810336
4  1988          0.2324215         1.24572619             0.03923136
5  1989          0.1253030         1.20624633             0.02553630
6  1990          0.1963098         1.57418568             0.02841587
7  1991          0.6326607         1.45479247             0.23471516
8  1992          1.0226223         2.38700785             0.54495423
9  1993          1.3675179         2.62865525             0.68333814
10 1994          1.7890934         2.26401362             0.82892965
11 1995          1.6747311         2.67319204             0.95065666
12 1996          2.3189222         2.64440549             1.21492081
13 1997          2.5814392         3.10459526             1.30147632
14 1998          2.7497396         3.05231530             1.36122264
15 1999          2.5111600         3.32660416             1.33106990
16 2000          4.6697497         4.14909239             1.78483968
17 2001          5.6591067         3.76404441             2.19247065
18 2002          5.4971366         3.79766644             2.15062172
19 2003          4.9908209         4.55279786             4.91794342
20 2004          5.2204104         5.57506947             4.83398914
21 2005          6.2504925         6.83412606             7.28545175
22 2006          4.8715160         8.07241989             5.42827409
23 2007          5.6529206         9.10370787             7.34375239
24 2008          6.7249125         9.51564634             8.91410171
25 2009          6.7004733         8.64357423             8.98533853
26 2010          7.6412119         9.10723202             9.17361944
27 2011          7.1662207        10.00000000             9.21421152
28 2012         10.0000000         9.06442083            10.00000000

Is this what you want?

We can use .SDcols .

To apply by names

library(data.table)

df[, (vars):= lapply(.SD, scales::rescale, to = c(0, 10)), .SDcols = vars]

To apply by position

df[, 2:4 := lapply(.SD, scales::rescale, to = c(0, 10)), .SDcols = 2:4]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM