[英]model.matrix raises memory allocation error
我正在使用model.matrix
從現有數據框中創建許多列。 目標是創建許多列,每個列的名稱都等於一個要素列的不同值( my_one_feature
)。 也就是說,如果my_one_feature
是值為{cat_1,cat_2,cat_3}
的類別變量,則我希望生成3個其他列,其名稱為: cat_1
, cat_2
, cat_3
並且每個值的取值為0或1,具體取決於它們的存在。對應的行。
mm <- model.matrix(~factor(my_one_feature)-1,data=my_data_frame)
那我可以
cbind(my_data_frame,mm)
我認為功能任務正是我所解釋的。 但是,對於大數據和/或大特征值,會產生內存分配錯誤:
cannot allocate vector of size 50 Gb
我知道結果矩陣將是稀疏的。 如何避免這種內存分配問題?
這是一個只有7行的具有4個原始功能的示例:
f1<-c('f1_1','f1_2','f1_1','f1_3','f1_3','f1_1','f1_4')
f2<-c(1,2,3,4,2,4,2)
f3<-c(1,2,3,4,5,6,7)
f4<-c(0,0,1,1,1,0,1)`
my_data_frame<-data.frame(f1,f2,f3,f4)
看起來像:
my_data_frame
f1 f2 f3 f4
1 f1_1 1 1 0
2 f1_2 2 2 0
3 f1_1 3 3 1
4 f1_3 4 4 1
5 f1_3 2 5 1
6 f1_1 4 6 0
7 f1_4 2 7 1
mm<-sparse.model.matrix(~factor(f1)-1,data=my_data_frame)
看起來像:
7 x 4 sparse Matrix of class "dgCMatrix"
factor(f1)f1_1 factor(f1)f1_2 factor(f1)f1_3 factor(f1)f1_4
1 1 . . .
2 . 1 . .
3 1 . . .
4 . . 1 .
5 . . 1 .
6 1 . . .
7 . . . 1
如何將my_data_frame與mm組合以使生成的對象可以具有所有(f1, f2, f3, f4, factor(f1)f1_1, factor(f1)f1_2, factor(f1)f1_3, factor(f1)f1_4))
列(f1, f2, f3, f4, factor(f1)f1_1, factor(f1)f1_2, factor(f1)f1_3, factor(f1)f1_4))
和當然是7行。
好
您的答案會在我的rstudio工具上給出以下結果:
> my_data_frame <- data.frame(
+ f1=c('f1_1','f1_2','f1_1','f1_3','f1_3','f1_1','f1_4'),
+ f2=c(1,2,3,4,2,4,2),
+ f3=c(1,2,3,4,5,6,7),
+ f4=c(0,0,1,1,1,0,1))
> library("Matrix")
> mm <- sparse.model.matrix(~factor(f1)-1,
+ data=my_data_frame)
> new_data_frame <- cbind(Matrix(as.matrix(my_data_frame[,-1])),
+ mm)
> dim(new_data_frame)
[1] 1 2
> str(new_data_frame)
List of 2
$ :Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
.. ..@ x : num [1:21] 1 2 3 4 2 4 2 1 2 3 ...
.. ..@ Dim : int [1:2] 7 3
.. ..@ Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:3] "f2" "f3" "f4"
.. ..@ factors : list()
$ :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:7] 0 2 5 1 3 4 6
.. ..@ p : int [1:5] 0 3 4 6 7
.. ..@ Dim : int [1:2] 7 4
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:7] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:4] "factor(f1)f1_1" "factor(f1)f1_2" "factor(f1)f1_3" "factor(f1)f1_4"
.. ..@ x : num [1:7] 1 1 1 1 1 1 1
.. ..@ factors : list()
- attr(*, "dim")= int [1:2] 1 2
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "" "mm"
>
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Lithuanian_Lithuania.1257 LC_CTYPE=Lithuanian_Lithuania.1257 LC_MONETARY=Lithuanian_Lithuania.1257 LC_NUMERIC=C
[5] LC_TIME=Lithuanian_Lithuania.1257
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-2
loaded via a namespace (and not attached):
[1] grid_3.1.3 lattice_0.20-30 tools_3.1.3
>
設置數據:
my_data_frame <- data.frame(
f1=c('f1_1','f1_2','f1_1','f1_3','f1_3','f1_1','f1_4'),
f2=c(1,2,3,4,2,4,2),
f3=c(1,2,3,4,5,6,7),
f4=c(0,0,1,1,1,0,1))
現在使用sparse.model.matrix
作為分類功能:
library("Matrix")
mm <- sparse.model.matrix(~factor(f1)-1,
data=my_data_frame)
結合回的數值預測(強迫data.frame
- > matrix
- > Matrix
):
new_data_frame <- cbind(Matrix(as.matrix(my_data_frame[,-1])),
mm)
結果:
dim(new_data_frame)
## [1] 7 7
str(new_data_frame)
## Formal class 'dgeMatrix' [package "Matrix"] with 4 slots
## ..@ x : num [1:49] 1 2 3 4 2 4 2 1 2 3 ...
## ..@ Dim : int [1:2] 7 7
## ..@ Dimnames:List of 2
## .. ..$ : chr [1:7] "1" "2" "3" "4" ...
## .. ..$ : chr [1:7] "f2" "f3" "f4" "factor(f1)f1_1" ...
## ..@ factors : list()
object.size(new_data_frame) ## 1596 bytes
結果不包含原始f1
列,因為矩陣不能有異類類型-但就沒有辦法使用該列原始形式,在任何情況下,數值模擬和預測...
會話信息(OP使用3.1.3 / windows 8 x64 /立陶宛語言環境/Matrix_1.2-2/tools_3.1.3):
R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-2
loaded via a namespace (and not attached):
[1] grid_3.2.1 lattice_0.20-33
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.