[英]Accord.Net - CacheSize on LibLinear
I'm attempting to classify some inputs (text classification: 10,000+ examples, and 100,000+ features) 我正在尝试对一些输入进行分类(文本分类:10,000多个示例和100,000多个功能)
And I've read that using LibLinear is far faster / more memory efficient for such tasks, as such, I've ported my LibSvm classifier to accord/net, like so: 而且我已经读到,使用LibLinear可以更快/更有效地执行此类任务,因此,我已经将LibSvm分类器移植到Accord / net,如下所示:
//SVM Settings
var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
{
//Using LIBLINEAR's L2-loss SVC dual for each SVM
Learner = (p) => new LinearDualCoordinateDescent<Linear, Sparse<double>>()
{
Loss = Loss.L2,
Complexity = 1,
}
};
var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();
var classes = allTerms.Select(t => t.Class).ToArray();
//Train the model
var model = teacher.Learn(inputs, classes);
At the point of .Learn()
- I get an instant OutOfMemoryExcpetion
. 在
.Learn()
-我得到一个即时的OutOfMemoryExcpetion
。
I've seen there's a CacheSize
setting in the documentation, however, I cannot find where I can lower this setting, as is show in many examples. 我已经看到文档中有一个
CacheSize
设置,但是,如许多示例所示,我找不到可以降低此设置的位置。
One possible reason - I'm using the 'Hash trick' instead of indices - is Accord.Net attempting to allocate an array of the full hash space? 一个可能的原因-我使用的是“哈希技巧”而不是索引-是Accord.Net尝试分配完整哈希空间的数组吗? (probably close to int.MaxValue) if so - is there any way to avoid this?
(可能接近int.MaxValue)-如果有,可以避免这种情况吗?
Any help is most appreciated! 任何帮助深表感谢!
Allocating hash space of 10000+ documents with 100000+ features will take at least 4 GB of memory, which may be limited by the AppDomain memory limit and CLR object size limit. 分配具有100000+个功能的10000+个文档的哈希空间将占用至少4 GB的内存,这可能会受到AppDomain内存限制和CLR对象大小限制的限制。 Many projects by default are prefered to be built under 32-bit platform, which does not allow allocation of objects more than 2GB.
默认情况下,许多项目都倾向于在32位平台下构建,该平台不允许分配超过2GB的对象。 I've managed to overcome this by removing 32-bit platform prefernce (go to project properties -> build and uncheck "Prefer 32-bit").
我设法通过消除32位平台偏好来克服了这一点(转到项目属性->构建并取消选中“首选32位”)。 After that we should allow creation of objects more taking more than 2 GB or memory, add this line to your configuration file
之后,我们应该允许创建更多占用2 GB或更多内存的对象,将此行添加到您的配置文件中
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
Be aware that if you add this line but leave the 32-bit platform build preference you will still get an exception, as your project will not be able to allocate an array of such size 请注意,如果添加此行但保留32位平台构建首选项,您仍然会遇到异常,因为您的项目将无法分配此类大小的数组
This is how you tune the CacheSize 这就是调整CacheSize的方式
//SVM Settings
var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
{
Learner = (p) => new SequentialMinimalOptimization<Linear, Sparse<double>>()
{
CacheSize = 1000
Complexity = 1,
}
};
var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();
var classes = allTerms.Select(t => t.Class).ToArray();
//Train the model
var model = teacher.Learn(inputs, classes);
This way of constructing an SVM does cope with Sparse<double>
data structure, but it is not using LibLinear. 这种构造SVM的方式确实可以解决
Sparse<double>
数据结构,但是不使用LibLinear。 If you open Accord.NET repository and look at SVM solving algorithms with LibLinear support ( LinearCoordinateDescent , LinearNewtonMethod ) you will see no CacheSize property. 如果打开Accord.NET存储库并查看具有LibLinear支持的SVM解决算法( LinearCoordinateDescent和LinearNewtonMethod ),将看不到CacheSize属性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.