[英]trouble with rpy2, rpart passing data correctly from python to r
我正在嘗試使用Python 2.6.5和R 10.0通過RPY2運行rpart。
我在python中創建了一個數據框並將其傳遞,但出現錯誤:
Error in function (x) : binary operation on non-conformable arrays
Traceback (most recent call last):
File "partitioningSANDBOX.py", line 86, in <module>
model=r.rpart(**rpart_params)
File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 83, in __call__
File "build/bdist.macosx-10.3-fat/egg/rpy2/robjects/functions.py", line 35, in __call__
rpy2.rinterface.RRuntimeError: Error in function (x) : binary operation on non-conformable arrays
誰能幫助我確定我在做錯什么以引發此錯誤?
我的代碼的相關部分是這樣的:
import numpy as np
import rpy2
import rpy2.robjects as rob
import rpy2.robjects.numpy2ri
#Fire up the interface to R
r = rob.r
r.library("rpart")
datadict = dict(zip(['responsev','predictorv'],[cLogEC,csplitData]))
Rdata = r['data.frame'](**datadict)
Rformula = r['as.formula']('responsev ~.')
#Generate an RPART model in R.
Rpcontrol = r['rpart.control'](minsplit=10, xval=10)
rpart_params = {'formula' : Rformula, \
'data' : Rdata,
'control' : Rpcontrol}
model=r.rpart(**rpart_params)
兩個變量cLogEC和csplitData是浮點類型的numpy數組。
另外,我的數據框如下所示:
In [2]: print Rdata
------> print(Rdata)
responsev predictorv
1 0.6020600 312
2 0.3010300 300
3 0.4771213 303
4 0.4771213 249
5 0.9242793 239
6 1.1986571 297
7 0.7075702 287
8 1.8115750 270
9 0.6020600 296
10 1.3856063 248
11 0.6127839 295
12 0.3010300 283
13 1.1931246 345
14 0.3010300 270
15 0.3010300 251
16 0.3010300 246
17 0.3010300 273
18 0.7075702 252
19 0.4771213 252
20 0.9294189 223
21 0.6127839 252
22 0.7075702 267
23 0.9294189 252
24 0.3010300 378
25 0.3010300 282
公式如下所示:
In [3]: print Rformula
------> print(Rformula)
responsev ~ .
該問題與rpart中的R特有代碼有關(確切地說,以下塊,尤其是最后一行:
m <- match.call(expand.dots = FALSE)
m$model <- m$method <- m$control <- NULL
m$x <- m$y <- m$parms <- m$... <- NULL
m$cost <- NULL
m$na.action <- na.action
m[[1L]] <- as.name("model.frame")
m <- eval(m, parent.frame())
)。
解決該問題的一種方法是避免輸入該代碼塊(請參見下文),或者可能是從Python構建嵌套的評估(以便parent.frame()起作用)。 這並不像人們希望的那么簡單,但是也許我會在將來找到時間使它變得更容易。
from rpy2.robjects import DataFrame, Formula
import rpy2.robjects.numpy2ri as npr
import numpy as np
from rpy2.robjects.packages import importr
rpart = importr('rpart')
stats = importr('stats')
cLogEC = np.random.uniform(size=10)
csplitData = np.array(range(10), 'i')
dataf = DataFrame({'responsev': cLogEC,
'predictorv': csplitData})
formula = Formula('responsev ~.')
rpart.rpart(formula=formula, data=dataf,
control=rpart.rpart_control(minsplit = 10, xval = 10),
model = stats.model_frame(formula, data=dataf))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.