简体   繁体   English

稀疏偏最小二乘回归

[英]sparse partial least square regression

I have two data-sets as follows: 我有两个数据集,如下所示:

     http://www.filedropper.com/dataa_1 ## DataA
     http://www.filedropper.com/datab   ## DataB

In dataA, we have 42 rows and 8 columns and in DataB 42 rows and 6 columns. 在dataA中,我们有42行8列,在DataB中,我们有42行6列。 We wanted to do CCA and sPLS using both of these data in R. But my question here is when we look at DataB, always every eleven rows will have the same values. 我们想使用R中的这两个数据进行CCA和sPLS。但是我的问题是,当我们查看DataB时,总是每11行将具有相同的值。 Will this affect the results or cause a discrepancy in either the CCA or sPLS? 这会影响结果还是会导致CCA或sPLS出现差异?

After looking at block B, it looks like the variables are discrete. 看完块B之后,看起来变量是离散的。

It is not a (technical) problem to use such variables in PLS or CCA, but it poses statistical "challenges": the use of bootstap or jackknife may be required to go further into the statistical interpretation of the results. 在PLS或CCA中使用此类变量不是(技术上的)问题,但会带来统计上的“挑战”:可能需要使用自举或折刀来进一步对结果进行统计解释。

You should also ask yourself if this "discrete" representation is accurate for your data. 您还应该问自己,这种“离散”表示对于您的数据是否准确。 It may be wrong if the original variables are categorical, in which case you should use dummy variables . 如果原始变量是分类变量,则可能是错误的,在这种情况下,您应该使用哑变量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM