简体   繁体   English

为什么写入不相关的文件导致加载功能如此之慢?

[英]Why does writing to an unrelated file cause the load function to be so slow?

I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. 我花了一些时间调试一些特别慢的代码,并且完全被MATLAB分析器抛弃了。 This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here. 这对我来说就像一个巨大的虫子,所以我想知道是否有人可以对这里发生的事情发表任何看法。

Here is some code that will cause the problem: 以下是一些会导致问题的代码:

function profiler_test

  %%% Create 20 files with random data

  count = 20;

  for i = 1 : count
    x = rand(3);
    save(sprintf('temp_file_%06d', i), 'x'); 
  end

  %%% Load them in a for loop

  xs = cell(1, count);

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;
  end
  toc

  %%% Load them in a for loop, but writing a small log file on the way

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;

    file = fopen(sprintf('temp_logfile_%d', i), 'w');
    fprintf(file, 'Success\n');
    fclose(file);
  end
  toc


end

The first for loop takes 0.239739 seconds, the second takes 4.411179. 第一个for循环需要0.239739秒,第二个需要4.411179。

Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. 现在,我应该清楚地知道我已经知道我在第二个for循环示例中显示的为每个结果创建一个日志文件的草率的想法 - 这是因为我在一个我看不到的集群上运行输出,我想要一个廉价的功能进展指示,这结果是瓶颈。 I'm fine with that. 很好

My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this: 然而,我的问题是我花了一天时间试图优化错误的行,因为MATLAB分析器说:

         1   24   tic; 
         1   25   for i = 1 : count 
4.41    20   26     x = load(sprintf('temp_file_%06d', i), 'x'); 
        20   27     xs{i} = x.x; 
             28     
        20   29     file = fopen(sprintf('temp_logfile_%d', i), 'w'); 
        20   30     fprintf(file, 'Success\n'); 
        20   31     fclose(file); 
        20   32   end 
         1   33   toc

It's placed the entire time taken to execute the final three lines on the line for load . 它将整个时间用于执行线路上的最后三行以进行load In my actual program, the load was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. 在我的实际程序中, load不是那么接近另一位,所以直到我决定不信任分析器才发生。 My question is: what is going on here? 我的问题是:这里发生了什么? Why has this happened and should I be watching out for any more bizarre behaviour like this? 为什么会发生这种情况,我是否应该留意这样的奇怪行为?

I'm using MATLAB 2011a. 我正在使用MATLAB 2011a。 Many thanks. 非常感谢。

EDIT: I seem to be causing some confusion, apologies. 编辑:我似乎引起了一些混乱,道歉。 Here is the situation: 情况如下:

  • The two for loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration. 上面显示的两个for循环是相同的,除了第二个在底部有三行,每次迭代都写入一个临时文件。
  • The second loop takes substantially longer to run: the conclusion is that those last three lines are to blame for the speed increase. 第二个循环运行时间要长得多:结论是最后三行是速度增加的原因。 When they are removed, the code is fast again. 当它们被删除时,代码再次快速。
  • However, the profiler does not attribute any of the time for the second loop to those final three statements. 然而,分析器没有为第二循环属性的任何时间,最后那些三个语句。 Instead, it tells me that my load function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. 相反,它告诉我我的load函数调用 - 与第一个循环完全相同的调用,这更快 - 现在需要4秒而不是0.2。 So either the presence of the last three lines causes the load to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting that load is taking 4 seconds when it is clearly not . 因此,最后三行的存在会导致 load变慢(我忽略了这一点;甚至是可能吗?),或者MATLAB分析器错误地报告load在4秒时显然不是

Either way it seems to me that something very strange is happening. 无论哪种方式,在我看来,发生了一些非常奇怪的事情。

EDIT: Seem to have answered it myself, see below. 编辑:似乎自己已经回答了,见下文。 Changed the title as it was misleading 改变了标题,因为它具有误导性

I do not see any evidence of a bug in your post. 我没有看到你帖子中有任何错误的证据。

You mention that the entire loop takes about 4.111 and the profiler shows that line 26 takes about 4.11 . 你提到整个循环大约需要4.111 ,而分析器显示第26行需要大约4.11

This means that all other lines together take less than 0.01 and therefore each line takes a rounded amount of zero seconds. 这意味着所有其他线路一起小于0.01 ,因此每条线路的舍入量为零秒。

My guess is that zeroes are just not printed and that you interpreted this as the other lines not being timed. 我的猜测是,只是没有打印零,并且你将其解释为其他未定时的行。

I may be missing something but so far the output provided by MATLAB seems to be consistent. 我可能会遗漏一些东西,但到目前为止,MATLAB提供的输出似乎是一致的。

Actually, I think I've solved it. 实际上,我想我已经解决了。 I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct. 我错误地得出结论,在新线路上发生了额外的处理时间,所以我的问题现在有点误导 - 探查器是正确的。 However, I still didn't understand why writing to a temporary file would cause load to slow down. 但是,我仍然不明白为什么写入临时文件会导致load速度变慢。 I had a thought, which was to try this: 我有一个想法,就是试试这个:

file = fopen(sprintf('../temp_logfile_%d', i), 'w');

That is, write to a file in the parent directory instead of the current working directory. 也就是说,写入父目录中的文件而不是当前工作目录。 This removed the problem, and was very fast. 这解决了这个问题,而且非常快。 The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories. 我猜,原因是当前目录在我的MATLAB搜索路径中,就像一堆其他目录一样。 I presume that every time MATLAB uses a function which looks though the whole search path, as load does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. 我假设每次MATLAB使用一个看起来像整个搜索路径的函数时,就像load一样,它会检查是否有任何目录被修改过,如果有的话,重新解析整个批次以查看可用的文件。 Writing a new file to the working directory certainly would have caused this. 将新文件写入工作目录肯定会导致这种情况。 This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path. 在我的情况下,这可能更糟,因为我在工作目录中也有一整个子目录树,它们是搜索路径的一部分。

Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. 无论如何,感谢那些看起来很抱歉的人,答案结果与问题完全不同。 Be aware when using functions which rely on the entire search path! 使用依赖整个搜索路径的功能时请注意!

I get the following report generated by the profiler of MATLAB 2012b, I dont see a bug. 我得到了由MATLAB 2012b的剖析器生成的以下报告,我没有看到错误。 在此输入图像描述在此输入图像描述在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM