[英]spark mllib memory error on svd (single machine)
I have a large data file (around 4 GB) and I am analyzing it using spark on a single pc. 我有一个大数据文件(大约4 GB),并且正在使用一台PC上的spark分析它。
scala> x
res29: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@5a86096a
scala> x.numRows
res27: Long = 302529
scala> x.numCols
res28: Long = 1828
When I try to compute the principal components I get a memory error: 当我尝试计算主要成分时,出现内存错误:
scala> val pc: Matrix = x.computePrincipalComponents(2)
15/03/30 14:55:22 INFO ContextCleaner: Cleaned shuffle 1
java.lang.OutOfMemoryError: Java heap space
at breeze.linalg.svd$.breeze$linalg$svd$$doSVD_Double(svd.scala:92)
at breeze.linalg.svd$Svd_DM_Impl$.apply(svd.scala:39)
at breeze.linalg.svd$Svd_DM_Impl$.apply(svd.scala:38)
at breeze.generic.UFunc$class.apply(UFunc.scala:48)
at breeze.linalg.svd$.apply(svd.scala:22)
at org.apache.spark.mllib.linalg.distributed.RowMatrix.computePrincipalComponents(RowMatrix.scala:380)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
How can I solve that? 我该如何解决?
If you happen to have more RAM than Spark currently utilizes, you can try to increase the Java heap size with the command-line option --driver-memory 8g
(assuming "local" mode here, in which the calculation is done by the driver program). 如果您碰巧拥有的内存超过Spark当前使用的内存,则可以尝试使用命令行选项
--driver-memory 8g
(此处为“本地”模式,由驱动程序进行计算)来增加Java堆的大小。程序)。 Default is only 512m. 默认只有512m。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.