简体   繁体   中英

Correct variable values in a dataframe applying a function using variable-specific values in another dataframe in R

I have a df called 'covs' with sites on rows and in columns, 9 different environmental variables for each of these sites. I need to recalculate the value of each cell using the function x - center_values(x)) / scale_values(x) . However, 'center_values' and 'scale_values' are different for each environmental covariate, and they are located in another df called 'correction'.

I have found many solutions for applying a function for a whole df, but not for applying specific values according to the id of the value to transform.

covs <- read.table(text = "X        elev   builtup     river       grip            pa       npp  treecov
384879-2009   1   24.379101 25188.572 1241.8348  1431.1082  5.705152e+03 16536.664 60.23175
385822-2009   2   29.533478 32821.770 2748.9053  1361.7772  2.358533e+03 15773.115 62.38455
385823-2009   3   30.097059 28358.244 2525.7627  1073.8772  4.340906e+03 14899.451 46.03269
386765-2009   4   33.877861 40557.891  927.4295  1049.4838  4.580944e+03 15362.518 53.08151
386766-2009   5   38.605156 36182.801 1479.6178  1056.2130  2.517869e+03 13389.958 35.71379", 
header= TRUE)

correction <- read.table(text = "var_name    center_values     scale_values
1        X            196.5 113.304898393671
2     elev 200.217889868483 307.718211316278
3  builtup 31624.4888660664 23553.2438790344
4    river 1390.41023742909 1549.88661649406
5     grip 5972.67361738244 6996.57793554527
6       pa 2731.33431010861 4504.71055521749
7      npp 10205.2997576655 2913.19658598938
8  treecov 47.9080656134352 17.7101565911347
9   nonveg 7.96755640452006 4.56625351682905", header= TRUE)

Could someone help me write a code to recalculate the environmental covariate values in 'covs' using the specific covariate values reported in 'correction'? Eg For each value in the column 'elev' of the df 'covs', I need to substract the 'center_value' reported for 'elev' in the 'corrected' df, and then divided by the 'scale_value' of 'elev' reported in 'corrected' df. Thank you for your kind help.

You may assign var_name to row names, then loop over the names of covs to do the calculations in an sapply .

rownames(correction) <- correction$var_name

res <- as.data.frame(sapply(names(covs), function(x, y) 
  (covs[, x] - correction[x, "center_values"])/correction[x, "scale_values"]))
res
#           X       elev     builtup       river       grip          pa      npp    treecov
# 1 -1.725433 -0.5714280 -0.27324970 -0.09586213 -0.6491124  0.66015733 2.173339  0.6958541
# 2 -1.716607 -0.5546776  0.05083296  0.87651254 -0.6590217 -0.08275811 1.911239  0.8174114
# 3 -1.707781 -0.5528462 -0.13867495  0.73253905 -0.7001703  0.35730857 1.611340 -0.1058927
# 4 -1.698956 -0.5405596  0.37928543 -0.29871910 -0.7036568  0.41059457 1.770295  0.2921174
# 5 -1.690130 -0.5251972  0.19353224  0.05755748 -0.7026950 -0.04738713 1.093183 -0.6885470

Check eg "elev" :

(covs[,"elev"] - correction["elev", "center_values"]) / correction["elev", "scale_values"]
# [1] -0.5714280 -0.5546776 -0.5528462 -0.5405596 -0.5251972

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM