简体   繁体   English

使用SAP HANA数据库中的数据在R中执行线性回归

[英]Perform linear regression in R with data from SAP HANA database

I am trying to import the dataset into R to apply linear regression model, but am skeptical of my code as am new to R. The dataset is as follows with 5000+ rows of data: 我正在尝试将数据集导入R以应用线性回归模型,但我对我的代码表示怀疑,因为它是R的新功能。数据集如下,包含5000多个数据行:

power consumption cputi dbsu power consumption cputi dbsu

as the column names and the followings integers as their values in the above column: 作为列名,以下整数作为它们在上一列中的值:

132 25 654 132 25 654

The sql code to call R function which I wrote is 调用我编写的R函数的sql代码是

CREATE COLUMN TABLE "PREDICTIVE ANALYSIS" LIKE "ANAGAPPAN.POWER_CONSUMPTION" WITH NO DATA;

SELECT POWER_APP, POWER_DB,CPUTI,DBTI,DBSU

FROM "ANAGAPPAN.POWER_CONSUMPTION";
DROP PROCEDURE USE_LM;

CREATE PROCEDURE USE_LM( IN train "ANAGAPPAN.POWER_CONSUMPTION", OUT result "PREDICTIVE ANALYSIS")

LANGUAGE

RLANG AS

BEGIN

library(lm)

model_app <- lm( POWER_APP ~ CPUTI + DBTI + DBSU + KBYTES_TRANSFERRED, data = train )

colnames(datOut) <- c("POWER_APP", "CPUTI", "DBTI", "DBSU", "DBSU")

PREDICTIVE ANALYSIS <- as.data.frame( lm(model_App))

END;

The result I obtain is it says the procedure is created but am unable to call the linear model on the data, how would I initiate the linear model? 我得到的结果是说程序已创建但无法在数据上调用线性模型,我将如何启动线性模型?

Although I'm not familiar with SAP products, I will have a stab at the R code I assume is between BEGIN and END; 尽管我对SAP产品不熟悉,但是我会刺痛我认为介于BEGINEND;之间的R代码END; .

library(lm)

is incorrect, as mentioned by @Olli. 是不正确的,如@Olli所述。 To access R's linear model capabilities, you have to call - nothing. 要访问R的线性模型功能,您必须调用-无。 It's loaded by default through stats package (this may not be true if R is called in --vanilla mode. 默认情况下是通过stats包加载的(如果在--vanilla模式下调用R,则可能不正确。

model_app <- lm( POWER_APP ~ CPUTI + DBTI + DBSU + KBYTES_TRANSFERRED, data = train )

appears to be OK, at least from a syntax's point of view. 至少从语法的角度来看似乎是可以的。

For 对于

colnames(datOut) <- c("POWER_APP", "CPUTI", "DBTI", "DBSU", "DBSU")

I can't see where you define datOut . 我看不到您在哪里定义datOut If this variable is not created by the database, it does not exist and R should complain along the lines of 如果该变量不是由数据库创建的,则该变量不存在,R应该沿着以下方式抱怨:

Error in colnames(notExist) <- "x" : object 'notExist' not found

I will assume you want to predict (means) based on a model. 我假设您要基于模型进行预测(均值)。 Line 线

PREDICTIVE ANALYSIS <- as.data.frame( lm(model_App))

will not work because R's variables should not have spaces, as.data.frame will not work on a lm object and model_App doesn't exist (notice the case). 不会起作用,因为的r变量不能有空格, as.data.frame不会在上工作lm对象和model_App不存在(通知的情况下)。 I think you should do something along the lines of 我认为您应该采取以下措施

# based on http://help.sap.com/hana/sap_hana_r_integration_guide_en.pdf
# you have to specify variable result which will be exported to the database
result <- as.data.frame(predict(model_app))

You can try it out. 您可以尝试一下。

x <- 1:10
y <- rnorm(10)

mdl <- lm(y ~ x)

as.data.frame(predict(mdl))

   predict(mdl)
1    0.47866685
2    0.34418219
3    0.20969753
4    0.07521287
5   -0.05927180
6   -0.19375646
7   -0.32824112
8   -0.46272579
9   -0.59721045
10  -0.73169511

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM