Accord.Net-LibLinear上的CacheSize

Question

I'm attempting to classify some inputs (text classification: 10,000+ examples, and 100,000+ features) 我正在尝试对一些输入进行分类（文本分类：10,000多个示例和100,000多个功能）

And I've read that using LibLinear is far faster / more memory efficient for such tasks, as such, I've ported my LibSvm classifier to accord/net, like so: 而且我已经读到，使用LibLinear可以更快/更有效地执行此类任务，因此，我已经将LibSvm分类器移植到Accord / net，如下所示：

        //SVM Settings
        var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
        {
            //Using LIBLINEAR's L2-loss SVC dual for each SVM
            Learner = (p) => new LinearDualCoordinateDescent<Linear, Sparse<double>>()
            {
                Loss = Loss.L2,
                Complexity = 1,
            }
        };

        var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();

        var classes = allTerms.Select(t => t.Class).ToArray();

        //Train the model
        var model = teacher.Learn(inputs, classes);

At the point of .Learn() - I get an instant OutOfMemoryExcpetion . 在.Learn() -我得到一个即时的OutOfMemoryExcpetion 。

I've seen there's a CacheSize setting in the documentation, however, I cannot find where I can lower this setting, as is show in many examples. 我已经看到文档中有一个CacheSize设置，但是，如许多示例所示，我找不到可以降低此设置的位置。

One possible reason - I'm using the 'Hash trick' instead of indices - is Accord.Net attempting to allocate an array of the full hash space? 一个可能的原因-我使用的是“哈希技巧”而不是索引-是Accord.Net尝试分配完整哈希空间的数组吗？ (probably close to int.MaxValue) if so - is there any way to avoid this? （可能接近int.MaxValue）-如果有，可以避免这种情况吗？

Any help is most appreciated! 任何帮助深表感谢！

Answer 1

Allocating hash space of 10000+ documents with 100000+ features will take at least 4 GB of memory, which may be limited by the AppDomain memory limit and CLR object size limit. 分配具有100000+个功能的10000+个文档的哈希空间将占用至少4 GB的内存，这可能会受到AppDomain内存限制和CLR对象大小限制的限制。 Many projects by default are prefered to be built under 32-bit platform, which does not allow allocation of objects more than 2GB. 默认情况下，许多项目都倾向于在32位平台下构建，该平台不允许分配超过2GB的对象。 I've managed to overcome this by removing 32-bit platform prefernce (go to project properties -> build and uncheck "Prefer 32-bit"). 我设法通过消除32位平台偏好来克服了这一点（转到项目属性->构建并取消选中“首选32位”）。 After that we should allow creation of objects more taking more than 2 GB or memory, add this line to your configuration file 之后，我们应该允许创建更多占用2 GB或更多内存的对象，将此行添加到您的配置文件中

<runtime>
    <gcAllowVeryLargeObjects enabled="true" />
</runtime>

Be aware that if you add this line but leave the 32-bit platform build preference you will still get an exception, as your project will not be able to allocate an array of such size 请注意，如果添加此行但保留32位平台构建首选项，您仍然会遇到异常，因为您的项目将无法分配此类大小的数组

This is how you tune the CacheSize 这就是调整CacheSize的方式

//SVM Settings
    var teacher = new MulticlassSupportVectorLearning<Linear, Sparse<double>>()
    {
        Learner = (p) => new SequentialMinimalOptimization<Linear, Sparse<double>>()
        {
            CacheSize = 1000
            Complexity = 1,
        }
    };

    var inputs = allTerms.Select(t => new Sparse<double>(t.Sentence.Select(s => s.Index).ToArray(), t.Sentence.Select(s => (double)s.Value).ToArray())).ToArray();

    var classes = allTerms.Select(t => t.Class).ToArray();

    //Train the model
    var model = teacher.Learn(inputs, classes);

This way of constructing an SVM does cope with Sparse<double> data structure, but it is not using LibLinear. 这种构造SVM的方式确实可以解决Sparse<double>数据结构，但是不使用LibLinear。 If you open Accord.NET repository and look at SVM solving algorithms with LibLinear support ( LinearCoordinateDescent , LinearNewtonMethod ) you will see no CacheSize property. 如果打开Accord.NET存储库并查看具有LibLinear支持的SVM解决算法（ LinearCoordinateDescent和LinearNewtonMethod ），将看不到CacheSize属性。

Accord.Net-LibLinear上的CacheSize

问题描述

1 个解决方案

解决方案1
1 2017-06-15 16:07:47

Accord.Net-LibLinear上的CacheSize

问题描述

1 个解决方案

解决方案1 1 2017-06-15 16:07:47

解决方案1
1 2017-06-15 16:07:47