如何提高此查找程序的效率？

Question

我有两个大型数据集-搜索为340,000 x 1，字段为348,000 x2。我的目标是在搜索中使用元素，在field（：，1）中找到其位置，然后在field（：，2）创建一个名为result的新单元格数组。

我直接使用cellfun耗尽了内存，因此我不得不将数据集拆分为子集，然后编译结果。

为此，我构建了以下程序，但这要花费非常长的时间：2小时40分钟！

我的问题是，如何才能更有效地执行此任务？ 我需要修改现有代码还是需要采用完全不同的方法来解决问题？

function result = bigdatacmp(search,field)

%BIGDATACMP(SEARCH,FIELD) takes strcmp jobs that require excessive amounts
%   memory and splits them up into manageable subsets. The results of the
%   subsets are then compiled to represent the original set.


tic

subsets = floor(size(search,1)/1000);       %Divides search into subsets
difference = size(search,1) - 1000*subsets; %# of elements in last subset

result = cell(0);                           %Establish empty variables

%Loops through all subsets. Finds location of matches in the first column
%of field. Compiles subset locations. Compiles results from second column
%of field.
for i = 1:subsets

    searchvalues = search(1000*i-999:1000*i);

    Zlogic = cellfun(@(x)(strcmp(x,field(:,1))),...
        search(1000*i-999:1000*i),'UniformOutput',false);

    result(1000*i-999:1000*i) = cellfun(@(x)(field(x,2)),...
        Zlogic,'UniformOutput',false);
end

%Performs same calculations as in loop, but for the final subset.
Zlogic = cellfun(@(x)(strcmp(x,field(:,1))),search(size(search,1)-...
    difference+1:size(search,1)),'UniformOutput',false);

result(end+1:end+difference) = cellfun(@(x)(field(x,2)),Zlogic,...
    'UniformOutput',false);

result = result';

toc
end

Answer 1

348k并不那么大。 考虑构建一个containers.Map对象映射到从field的第一列到第二列的对应条目的Map对象。 这样一来，您就无需为搜索中的每个条目进行详尽的field search 。

[编辑添加：]如果输入的总数为348k，我认为没有必要进一步拆分。

如何提高此查找程序的效率？

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-04-25 16:25:13

如何提高此查找程序的效率？

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-04-25 16:25:13

解决方案1
1 已采纳 2016-04-25 16:25:13