R中的矩陣乘法：需要數字/復雜矩陣/向量參數

Question

我正在使用mlbench包中的數據集BreastCancer ，並且我正在嘗試將以下矩陣乘法作為邏輯回歸的一部分。

我得到了前 10 列中的特征，並創建了一個名為 theta 的參數向量：

X <- BreastCancer[, 1:10]
theta <- data.frame(rep(1, 10))

然后我做了以下矩陣乘法：

constant <- as.matrix(X) %*% as.vector(theta[, 1])

但是，我收到以下錯誤：

Error in as.matrix(X) %*% as.vector(theta[, 1]) : 
  requires numeric/complex matrix/vector arguments

我是否需要先使用as.numeric(X)將矩陣轉換為加倍？ X中的值看起來像字符串，因為它們有雙引號。

Answer 1

矩陣乘法運算符/函數，如"%*%" 、 crossprod 、 tcrossprod期望具有“數字”、“復雜”或“邏輯”模式的矩陣。 但是，您的矩陣具有“字符”模式。

library(mlbench)
data(BreastCancer)
X <- as.matrix(BreastCancer[, 1:10])
mode(X)
#[1] "character"

您可能會感到驚訝，因為數據集似乎包含數字數據：

head(BreastCancer[, 1:10])
#       Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
#1 1000025            5         1          1             1            2
#2 1002945            5         4          4             5            7
#3 1015425            3         1          1             1            2
#4 1016277            6         8          8             1            3
#5 1017023            4         1          1             3            2
#6 1017122            8        10         10             8            7
#  Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses
#1           1           3               1       1
#2          10           3               2       1
#3           2           3               1       1
#4           4           3               7       1
#5           1           3               1       1
#6          10           9               7       1

但是你被印刷風格誤導了。 這些列實際上是字符或因素：

lapply(BreastCancer[, 1:10], class)
#$Id
#[1] "character"
#
#$Cl.thickness
#[1] "ordered" "factor" 
#
#$Cell.size
#[1] "ordered" "factor" 
#
#$Cell.shape
#[1] "ordered" "factor" 
#
#$Marg.adhesion
#[1] "ordered" "factor" 
#
#$Epith.c.size
#[1] "ordered" "factor" 
#
#$Bare.nuclei
#[1] "factor"
#
#$Bl.cromatin
#[1] "factor"
#
#$Normal.nucleoli
#[1] "factor"
#
#$Mitoses
#[1] "factor"

當您執行as.matrix時，這些列都被強制轉換為“字符”（有關詳細說明，請參閱R：為什么在將列轉換為因子后我沒有得到類型或類“因子”？）。

所以要進行矩陣乘法，我們需要正確地將這些列強制轉換為“數字”。

dat <- BreastCancer[, 1:10]

## character to numeric
dat[[1]] <- as.numeric(dat[[1]])

## factor to numeric
dat[2:10] <- lapply( dat[2:10], function (x) as.numeric(levels(x))[x] )

## get the matrix
X <- data.matrix(dat)
mode(X)
#[1] "numeric"

現在，您可以進行矩陣向量乘法等操作。

## some possible matrix-vector multiplications
beta <- runif(10)
yhat <- X %*% beta

## add prediction back to data frame
dat$prediction <- yhat

但是，我懷疑這是為您的邏輯回歸模型獲取預測值的正確方法，因為當您使用因子構建模型時，模型矩陣不是上面的X ，而是一個虛擬矩陣。 我強烈建議您使用predict 。

這條線也對我有用： as.matrix(sapply(dat, as.numeric))

看來你是幸運的。 數據集恰好具有與數值相同的因子水平。 一般來說，將因子轉換為數字應該使用我所做的方法。 相比

f <- gl(4, 2, labels = c(12.3, 0.5, 2.9, -11.1))
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1
#Levels: 12.3 0.5 2.9 -11.1

as.numeric(f)
#[1] 1 1 2 2 3 3 4 4

as.numeric(levels(f))[f]
#[1] 12.3  12.3  0.5   0.5   2.9   2.9   -11.1 -11.1

文檔頁面?factor對此進行了介紹。

R中的矩陣乘法：需要數字/復雜矩陣/向量參數

問題描述

1 個解決方案

解決方案1
12 已采納 2016-10-30 02:13:11

R中的矩陣乘法：需要數字/復雜矩陣/向量參數

問題描述

1 個解決方案

解決方案1 12 已采納 2016-10-30 02:13:11

解決方案1
12 已采納 2016-10-30 02:13:11