简体   繁体   English

向量化代码 - 如何减少 MATLAB 计算时间

[英]Vectorizing code - How to reduce MATLAB computational time

I have this piece of code我有这段代码

N=10^4;
for i = 1:N
    [E,X,T] = fffun(); % Stochastic simulation. Returns every time three different vectors (whose length is 10^3).
    X_(i,:)=X;
    T_(i,:)=T;
    GRID=[GRID T];
end
GRID=unique(GRID);
% Second part
for i=1:N
for j=1:(kmax)
    f=find(GRID==T_(i,j) | GRID==T_(i,j+1));
    s=f(1);
    e=f(2)-1;

 counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end

The code performs N different simulations of a stochastic process (which consists of 10^3 events, occurring at discrete moments (T vector) that depends on the specific simulation. Now (second part) I want to know, as a function of time istant, how many simulations are in a particular state (X assumes value between 1 and 10). The idea I had: create a grid vector with all the moments at which something happens in any simulation. Then, looping over the simulations, loop over the timesteps in which something happens and incrementing all the counter indeces that corresponds to this particular slice of time.该代码对随机过程执行 N 个不同的模拟(由 10^3 个事件组成,发生在取决于特定模拟的离散时刻(T 向量)。现在(第二部分)我想知道,作为时间常数的函数, 有多少模拟处于特定状态(X 假定值介于 1 和 10 之间)。我的想法是:创建一个网格向量,其中包含任何模拟中发生某些事情的所有时刻。然后,循环模拟,循环发生某些事情的时间步长,并递增与此特定时间片相对应的所有计数器 indeces。

However this second part is very heavy (I mean days of processing on a standard quad-core CPU).然而,第二部分非常繁重(我的意思是在标准四核 CPU 上处理数天)。 And it shouldn't.它不应该。 Are there any ideas (maybe about comparing vectors in a more efficient way) to cut the CPU time?是否有任何想法(也许是关于以更有效的方式比较向量)来减少 CPU 时间?

This is a standalone 'second_part'这是一个独立的“second_part”

N=5000;
counter=zeros(11,length(GRID));

for i=1:N
    disp(['Counting sim #' num2str(i)]);
    for j=1:(kmax)
        f=find(GRID==T_(i,j) | GRID==T_(i,j+1),2);
        s=f(1);
        e=f(2)-1;

        counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;

    end
end

counter=counter/N;
stop=find(GRID==Tmin);
stop=stop-1;
plot(counter(:,(stop-500):stop)')

with associated dummy data ( filedropper.com/data_38 ).与相关的虚拟数据( filedropper.com/data_38 )。 In the real context the matrix has 2x rows and 10x columns.在实际情况下,矩阵有 2x 行和 10x 列。

Here is what I understand:这是我的理解:

T_ is a matrix of time steps from N simulations. T_是来自 N 次模拟的时间步长矩阵。
X_ is a matrix of simulation state at T_ in those simulations. X_是这些模拟中T_处的模拟状态矩阵。

so if you do:所以如果你这样做:

[ut,~,ic]= unique(T_(:));

you get ic which is a vector of indices for all unique elements in T_ .你得到ic ,它是T_所有唯一元素的索引向量。 Then you can write:然后你可以写:

counter = accumarray([ic X_(:)],1);

and get counter with no.并得到没有的counter of rows as your unique timesteps, and no.行作为您唯一的时间步长,而不是。 of columns as the unique states in X_ (which are all, and must be, integers).列作为X_的唯一状态(它们都是并且必须是整数)。 Now you can say that for each timestep ut(k) the number of time that the simulation was in state m is counter(k,m) .现在你可以说对于每个时间步ut(k)模拟处于状态mcounter(k,m)

In your data, the only combination of m and k that has a value greater than 1 is (1,1) .在您的数据中,值大于 1 的mk的唯一组合是(1,1)


Edit:编辑:

From the comments below, I understand that you record all state changes, and the time steps when they occur.从下面的评论中,我了解到您记录了所有状态更改以及它们发生时的时间步长。 Then every time a simulation change a state you want to collect all the states from all simulations and count how many states are from each type.然后每次模拟更改状态时,您都希望从所有模拟中收集所有状态并计算每种类型有多少状态。

The main problem here is that your time is continuous, so basically each element in T_ is unique , and you have over a million time steps to loop over.这里的主要问题是你的时间是连续的,所以基本上T_每个元素都是唯一的,并且你有超过一百万个时间步来循环。 Fully vectorizing such a process will need about 80GB of memory which will probably stuck your computer.完全矢量化这样的过程需要大约 80GB 的内存,这可能会卡住你的计算机。

So I looked for a combination of vectorizing and looping through the time steps.所以我寻找向量化和循环时间步长的组合。 We start by finding all unique intervals, and preallocating counter :我们首先找到所有唯一的间隔,并预分配counter

ut = unique(T_(:));
stt = 11; % no. of states
counter = zeros(stt,numel(ut));r = 1:size(T_,1);
r = 1:size(T_,1); % we will need that also later

Then we loop over all element in ut , and each time look for the relevant timestep in T_ in all simulations in a vectorized way.然后我们遍历ut所有元素,每次以矢量化方式在所有模拟中查找T_中的相关时间步长。 And finally we use histcounts to count all the states:最后我们使用histcounts来计算所有状态:

for k = 1:numel(ut)
    temp = T_<=ut(k); % mark all time steps before ut(k)
    s = cumsum(temp,2); % count the columns
    col_ind = s(:,end); % fins the column index for each simulation
    % convert the coulmns to linear indices:
    linind = sub2ind(size(T_),r,col_ind.');
    % count the states:
    counter(:,k) = histcounts(X_(linind),1:stt+1);
end

This takes about 4 seconds at my computer for 1000 simulations, so it adds to a little more than one hour for the whole process.在我的电脑上进行 1000 次模拟大约需要 4 秒,因此整个过程增加了一个多小时。 Not very quick...不是很快...

You can try also one or two of the tweaks below to squeeze run time a little bit more:您还可以尝试以下一两个调整来缩短运行时间:

  1. As you can read here , accumarray seems to work faster in small arrays then histcouns .正如你可以在这里阅读accumarray似乎在小数组更快的工作,然后histcouns So may want to switch to it.所以可能想切换到它。

  2. Also, computing linear indices directly is a quicker method than sub2ind , so you may want to try that.此外,直接计算线性索引比sub2ind更快,因此您可能想尝试一下。

implementing these suggestions in the loop above, we get:在上面的循环中实施这些建议,我们得到:

R = size(T_,1);
r = (1:R).';
for k = 1:K
    temp = T_<=ut(k); % mark all time steps before ut(k)
    s = cumsum(temp,2); % count the columns
    col_ind = s(:,end); % fins the column index for each simulation
    % convert the coulmns to linear indices:
    linind = R*(col_ind-1)+r;
    % count the states:
    counter(:,k) = accumarray(X_(linind),1,[stt 1]);
end

In my computer switching to accumarray and or removing sub2ind gain a slight improvement but it was not consistent (using timeit for testing on 100 or 1K elements in ut ), so you better test it yourself.在我的计算机中切换到accumarray和/或删除sub2ind获得了轻微的改进,但它并不一致(使用timeitut 100 或 1K 元素进行测试),所以你最好自己测试一下。 However, this still remains very long.然而,这仍然很长。


One thing that you may want to consider is trying to discretize your timesteps, so you will have much less unique elements to loop over.您可能需要考虑的一件事是尝试离散化您的时间步长,这样您循环的独特元素就会少得多。 In your data about 8% of the time intervals a smaller than 1. If you can assume that this is short enough to be treated as one time step, then you could round your T_ and get only ~12.5K unique elements, which take about a minute to loop over.在您的数据中,大约 8% 的时间间隔小于 1。如果您可以假设这足够短以被视为一个时间步长,那么您可以舍入您的T_并仅获得 ~12.5K 唯一元素,这需要大约一分钟循环。 You can do the same for 0.1 intervals (which are less than 1% of the time intervals), and get 122K elements to loop over, what will take about 8 hours...您可以对 0.1 个间隔(小于时间间隔的 1%)执行相同操作,并获得 122K 个元素进行循环,大约需要 8 小时...

Of course, all the timing above are rough estimates using the same algorithm.当然,以上所有时间都是使用相同算法的粗略估计。 If you do choose to round the times there may be even better ways to solve this.如果您确实选择舍入时间,则可能有更好的方法来解决此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM