对于X中的每个元素，找到最大的索引，而不是在Y中

Question

我正在寻找一种方法来改善以下算法的性能。 给定两个数组X和Y.

对于X的每个元素，找到Y中最大值的索引，该索引不超过X中元素的值。可以安全地假设X和Y单调递增（排序）并且Y（1）小于每个X中的值.X也通常远大于Y.

作为一个例子，给出以下内容。

X = [0.2, 1.5, 2.2, 2.5, 3.5, 4.5, 5.5, 5.8, 6.5];
Y = [0.0, 1.0, 3.0, 4.0, 6.0];

我希望输出是

idx = [1, 2, 2, 2, 3, 4, 4, 4, 5]

我提出的最快的方法是下面的函数，它无法利用列表进行排序并使用for循环来逐步遍历其中一个数组的事实。 这提供了一个有效的解决方案，但在我使用此功能的实验中，在分析所需的总共30分钟内花费了将近27分钟。

function idx = matchintervals(X,Y)
  idx = zeros(size(X));
  for i = 1:length(Y)-1
    idx(X >= Y(i) & X < Y(i+1)) = i;
  end
  idx(X >= Y(end)) = length(Y);
end

任何帮助是极大的赞赏。

Answer 1

如果您正在寻找最快的解决方案，它可能最终成为一个简单的while循环（这样可以利用数组排序的事实）：

X = [0.2, 1.5, 2.2, 2.5, 3.5, 4.5, 5.5, 5.8, 6.5];
Y = [0.0, 1.0, 3.0, 4.0, 6.0];

xIndex = 1;
nX = numel(X);
yIndex = 1;
nY = numel(Y);
index = zeros(size(X))+nY;  % Prefill index with the largest index in Y

while (yIndex < nY) && (xIndex <= nX)
  if X(xIndex) < Y(yIndex+1)
    index(xIndex) = yIndex;
    xIndex = xIndex+1;
  else
    yIndex = yIndex+1;
  end
end

>> index

index =

     1     2     2     2     3     4     4     4     5

此循环将迭代最大numel(X)+numel(Y)-1次，如果X中有许多值大于Y的最大值，则可能更少。

时间：我用评论中的样本数据运行了一些时间。 以下是从最快到最慢排序的结果：

X = 1:3:(4e5);
Y = 0:20:(4e5-1);

% My solution from above:
tElapsed =
   0.003005977477718 seconds

% knedlsepp's solution:
tElapsed =
   0.006939387719075 seconds

% Divakar's solution:
tElapsed =
   0.011801273498343 seconds

% H.Muster's solution:
tElapsed =
   4.081793325423575 seconds

Answer 2

单行，但可能比gnovice的解决方案慢：

idx = sum(bsxfun(@ge, X, Y'));

Answer 3

使用sort和几个masks -

%// Concatenate X and Y and find the sorted indices
[sXY,sorted_id] = sort([X Y]);

%// Take care of sorted_id for identical values between X and Y
dup_id = find(diff(sXY)==0);
tmp = sorted_id(dup_id);
sorted_id(dup_id) = sorted_id(dup_id+1);
sorted_id(dup_id+1) = tmp;

%// Mask of Y elements in XY array
maskY = sorted_id>numel(X);

%// Find island lengths of Y elements in concatenated XY array
diff_maskY = diff([false maskY false]);
island_lens = find(diff_maskY ==-1) - find(diff_maskY ==1);

%// Create a mask of double datatype with 1s where Y intervals change
mask_Ys = [ false maskY(1:end-1)];
mask_Ysd = double(mask_Ys(~maskY));

%// Incorporate island lengths to change the 1s by offsetted island lengths
valid = mask_Ysd==1;
mask_Ysd(valid) = mask_Ysd(valid) + island_lens(1:sum(valid)) - 1;

%// Finally perform cumsum to get the output indices
idx = cumsum(mask_Ysd);

Answer 4

我有与Divakar类似的想法。 这基本上使用稳定sort在Y的值之后找到X值的插入点。 需要对X和Y进行排序才能正常工作！

%// Calculate the entry points
[~,I] = sort([Y,X]);
whereAreXs = I>numel(Y);
idx = find(whereAreXs)-(1:numel(X));

您可以通过以下方式查看X的值和不超过X值的Y的相应值：

%%// Output:
disp([X;Y(idx)]);

对于X中的每个元素，找到最大的索引，而不是在Y中

问题描述

4 个解决方案

解决方案1
4 已采纳 2015-05-04 18:42:40

解决方案2
4 2015-05-04 18:56:02

解决方案3
2 2015-05-04 18:21:57

解决方案4
2 2015-05-06 16:18:09

对于X中的每个元素，找到最大的索引，而不是在Y中

问题描述

4 个解决方案

解决方案1 4 已采纳 2015-05-04 18:42:40

解决方案2 4 2015-05-04 18:56:02

解决方案3 2 2015-05-04 18:21:57

解决方案4 2 2015-05-06 16:18:09

解决方案1
4 已采纳 2015-05-04 18:42:40

解决方案2
4 2015-05-04 18:56:02

解决方案3
2 2015-05-04 18:21:57

解决方案4
2 2015-05-06 16:18:09