[英]Why does writing to an unrelated file cause the load function to be so slow?
I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. 我花了一些时间调试一些特别慢的代码,并且完全被MATLAB分析器抛弃了。 This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.
这对我来说就像一个巨大的虫子,所以我想知道是否有人可以对这里发生的事情发表任何看法。
Here is some code that will cause the problem: 以下是一些会导致问题的代码:
function profiler_test
%%% Create 20 files with random data
count = 20;
for i = 1 : count
x = rand(3);
save(sprintf('temp_file_%06d', i), 'x');
end
%%% Load them in a for loop
xs = cell(1, count);
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
end
toc
%%% Load them in a for loop, but writing a small log file on the way
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
file = fopen(sprintf('temp_logfile_%d', i), 'w');
fprintf(file, 'Success\n');
fclose(file);
end
toc
end
The first for
loop takes 0.239739 seconds, the second takes 4.411179. 第一个
for
循环需要0.239739秒,第二个需要4.411179。
Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for
loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. 现在,我应该清楚地知道我已经知道我在第二个
for
循环示例中显示的为每个结果创建一个日志文件的草率的想法 - 这是因为我在一个我看不到的集群上运行输出,我想要一个廉价的功能进展指示,这结果是瓶颈。 I'm fine with that. 我很好 。
My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this: 然而,我的问题是我花了一天时间试图优化错误的行,因为MATLAB分析器说:
1 24 tic;
1 25 for i = 1 : count
4.41 20 26 x = load(sprintf('temp_file_%06d', i), 'x');
20 27 xs{i} = x.x;
28
20 29 file = fopen(sprintf('temp_logfile_%d', i), 'w');
20 30 fprintf(file, 'Success\n');
20 31 fclose(file);
20 32 end
1 33 toc
It's placed the entire time taken to execute the final three lines on the line for load
. 它将整个时间用于执行线路上的最后三行以进行
load
。 In my actual program, the load
was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. 在我的实际程序中,
load
不是那么接近另一位,所以直到我决定不信任分析器才发生。 My question is: what is going on here? 我的问题是:这里发生了什么? Why has this happened and should I be watching out for any more bizarre behaviour like this?
为什么会发生这种情况,我是否应该留意这样的奇怪行为?
I'm using MATLAB 2011a. 我正在使用MATLAB 2011a。 Many thanks.
非常感谢。
EDIT: I seem to be causing some confusion, apologies. 编辑:我似乎引起了一些混乱,道歉。 Here is the situation:
情况如下:
for
loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration. for
循环是相同的,除了第二个在底部有三行,每次迭代都写入一个临时文件。 load
function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. load
函数调用 - 与第一个循环完全相同的调用,这更快 - 现在需要4秒而不是0.2。 So either the presence of the last three lines causes the load
to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting that load
is taking 4 seconds when it is clearly not . load
变慢(我忽略了这一点;甚至是可能吗?),或者MATLAB分析器错误地报告load
在4秒时显然不是 。 Either way it seems to me that something very strange is happening. 无论哪种方式,在我看来,发生了一些非常奇怪的事情。
EDIT: Seem to have answered it myself, see below. 编辑:似乎自己已经回答了,见下文。 Changed the title as it was misleading
改变了标题,因为它具有误导性
I do not see any evidence of a bug in your post. 我没有看到你帖子中有任何错误的证据。
You mention that the entire loop takes about 4.111
and the profiler shows that line 26 takes about 4.11
. 你提到整个循环大约需要
4.111
,而分析器显示第26行需要大约4.11
。
This means that all other lines together take less than 0.01
and therefore each line takes a rounded amount of zero seconds. 这意味着所有其他线路一起小于
0.01
,因此每条线路的舍入量为零秒。
My guess is that zeroes are just not printed and that you interpreted this as the other lines not being timed. 我的猜测是,只是没有打印零,并且你将其解释为其他未定时的行。
I may be missing something but so far the output provided by MATLAB seems to be consistent. 我可能会遗漏一些东西,但到目前为止,MATLAB提供的输出似乎是一致的。
Actually, I think I've solved it. 实际上,我想我已经解决了。 I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct.
我错误地得出结论,在新线路上发生了额外的处理时间,所以我的问题现在有点误导 - 探查器是正确的。 However, I still didn't understand why writing to a temporary file would cause
load
to slow down. 但是,我仍然不明白为什么写入临时文件会导致
load
速度变慢。 I had a thought, which was to try this: 我有一个想法,就是试试这个:
file = fopen(sprintf('../temp_logfile_%d', i), 'w');
That is, write to a file in the parent directory instead of the current working directory. 也就是说,写入父目录中的文件而不是当前工作目录。 This removed the problem, and was very fast.
这解决了这个问题,而且非常快。 The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories.
我猜,原因是当前目录在我的MATLAB搜索路径中,就像一堆其他目录一样。 I presume that every time MATLAB uses a function which looks though the whole search path, as
load
does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. 我假设每次MATLAB使用一个看起来像整个搜索路径的函数时,就像
load
一样,它会检查是否有任何目录被修改过,如果有的话,重新解析整个批次以查看可用的文件。 Writing a new file to the working directory certainly would have caused this. 将新文件写入工作目录肯定会导致这种情况。 This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path.
在我的情况下,这可能更糟,因为我在工作目录中也有一整个子目录树,它们是搜索路径的一部分。
Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. 无论如何,感谢那些看起来很抱歉的人,答案结果与问题完全不同。 Be aware when using functions which rely on the entire search path!
使用依赖整个搜索路径的功能时请注意!
I get the following report generated by the profiler of MATLAB 2012b, I dont see a bug. 我得到了由MATLAB 2012b的剖析器生成的以下报告,我没有看到错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.