简体   繁体   中英

R : cannot run partial least square regression on more than one descriptor

I generated a csv table "T.CSV" :

"system","response","NIR.a","NIR.b"
 1,1,2,3
 2,4,5,6
 3,7,8,9

for which plsr succeeds for one descriptor but fails for multiple descriptors :

> library(pls)
> j <- read.csv(file="T.CSV",header=T,sep=",")
> head(j)
system response NIR.a NIR.b
1      1        1     2     3
2      2        4     5     6
3      3        7     8     9
> mod <- plsr(response ~ NIR.a , data = j ,  ncomp=1 )
> mod <- plsr(response ~ NIR , data = j ,  ncomp=1 )
Error in eval(expr, envir, enclos) : object 'NIR' not found

However, if I load the "oliveoil" example of the pls package, regression works with more than one descriptor :

> data(oliveoil)
> head(oliveoil)
chemical.Acidity chemical.Peroxide chemical.K232 chemical.K270 chemical.DK
G1             0.73              12.7         1.9           0.139       0.003
G2             0.19              12.3         1.678         0.116      -0.004
G3             0.26              10.3         1.629         0.116      -0.005
G4             0.67              13.7         1.701         0.168      -0.002
G5             0.52              11.2         1.539         0.119      -0.001
I1             0.26              18.7         2.117         0.142       0.001
sensory.yellow sensory.green sensory.brown sensory.glossy sensory.transp
G1           21.4          73.4          10.1           79.7           75.2
G2           23.4          66.3           9.8           77.8           68.7
G3           32.7          53.5           8.7           82.3           83.2
G4           30.2          58.3          12.2           81.1           77.1
G5           51.8          32.5             8           72.4           65.3
I1           40.7          42.9          20.1           67.7           63.5
sensory.syrup
G1          50.3
G2          51.7
G3          45.4
G4          47.8
G5          46.5
I1          52.2

Here pls works for multiple descriptors :

> mod <- plsr(chemical ~ sensory , data = oliveoil ,  ncomp=1 )
>

Can you please advise on where I've been wrong in my 1st table ?

Thanks in advance !

If we look at the str(oliveoil) , the 'sensory' is a matrix with n columns. So, to use the formula in that way, the "NIR" should be also a matrix inside a data.frame

j1 <- j[1:2]
j1["NIR"] <- as.matrix(setNames(j[3:4], letters[1:2]))
mod <- plsr(response ~ NIR , data = j1 ,  ncomp=1 )
str(mod)
#List of 19
# $ coefficients   : num [1:2, 1, 1] 0.5 0.5
# ..- attr(*, "dimnames")=List of 3
# .. ..$ : chr [1:2] "a" "b"
# .. ..$ : chr "response"
# .. ..$ : chr "1 comps"
# ----

data

j <- structure(list(system = 1:3, response = c(1L, 4L, 7L),
 NIR.a = c(2L, 
 5L, 8L), NIR.b = c(3L, 6L, 9L)), .Names = c("system", "response", 
 "NIR.a", "NIR.b"), class = "data.frame", row.names = c(NA, -3L))

In your command, mod <- plsr(response ~ NIR , data = j , ncomp=1 ) , on the RHS of the ~ make sure the name of explanatory variables match exactly to the column names in the data (in terms of spelling, and upper/lower case). In R's response to head I notice there is no column called NIR . But there is one called NIR.a and one called NIR.b . Have you checked if replacing NIR with NIR.a or NIR.b works?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM