[英]Running Matlab batch job on HPC cluster
I am trying to get Matlab to execute a number of scripts as individual batch jobs. 我正在尝试让Matlab作为独立的批处理作业来执行许多脚本。 Each script loads some data from excel sheets and implements a neural network. 每个脚本都从excel表格加载一些数据,并实现一个神经网络。 The neural network uses parfor loops internally for parameter tuning. 神经网络内部使用parfor循环进行参数调整。
When I run the batch job on my local machine it works fine. 当我在本地计算机上运行批处理作业时,它可以正常工作。 My Matlab code looks like 我的Matlab代码看起来像
job1 = batch('Historical1Step',...
'Profile', 'local',...
'Matlabpool', 3,...
'CaptureDiary',true,...
'CurrentDirectory', '.');
try
job1Results = fetchOutputs(job1);
catch err
delete(job1);
rethrow(err);
end
delete(job1);
and the diary output I get is 我得到的日记输出是
--- Start Diary ---
Analysing data for stock BAX
num_its =
2
100%[============================
100%[===================================================]
--- End Diary ---
However, when I change from the 'local' config to my server config I get 但是,当我从“本地”配置更改为服务器配置时,我得到了
--- Start Diary ---
--- End Diary ---
Error using parallel.Job/fetchOutputs (line 869)
An error occurred during execution of Task with ID 1.
Error in SOExample (line 14)
job1Results = fetchOutputs(job1);
Caused by:
Index exceeds matrix dimensions.
I am assuming the problem is something to do with the visibility of my functions/data on the workers, but I have tried every combination of the 'FileDependencies' and 'PathDependencies' options I can think of within the batch function to no avail. 我以为问题出在我的函数/数据在工作程序上的可见性上,但是我尝试了批处理函数中我无法想到的'FileDependencies'和'PathDependencies'选项的每种组合。
Any help would be much appreciated, and apologies in advance if I have done something monumentally stupid without realising it! 任何帮助将不胜感激,如果我在没有意识到的情况下做过一些愚蠢的事,请提前道歉!
EDIT- 编辑-
The error stack is as follows: 错误堆栈如下:
Index exceeds matrix dimensions.
Error in Historical1Step (line 13)
Error in parallel.internal.cluster/executeScript (line 22)
eval(['iClearAndSetCallerWorkspace(workspaceIn);' scriptName]);
Error in parallel.internal.evaluator/evaluateWithNoErrors (line 14)
[out{1:nOut}] = feval(fcn, args{:});
Error in parallel.internal.evaluator/CJSStreamingEvaluator/evaluate (line 31)
[out, errOut] = parallel.internal.evaluator.evaluateWithNoErrors( fcn, nOut, args );
Error in dctEvaluateTask>iEvaluateTask/nEvaluateTask (line 281)
[output, errOutput, cellTextOutput{end+1}] = evaluator.evaluate(fcn, nOut, args);
Error in dctEvaluateTask>iEvaluateTask (line 141)
nEvaluateTask();
Error in dctEvaluateTask (line 57)
[resultsFcn, taskPostFcn, taskEvaluatedOK] = iEvaluateTask(job, task, runprop);
Error in distcomp_evaluate_filetask_core>iDoTask (line 149)
dctEvaluateTask(postFcns, finishFcn);
Error in distcomp_evaluate_filetask_core (line 48)
iDoTask(handlers, postFcns);
Error using parallel.Job/fetchOutputs (line 869)
An error occurred during execution of Task with ID 1.
Error in SOExample (line 14)
job1Results = fetchOutputs(job1);
Caused by:
Index exceeds matrix dimensions.
The file 'Historical1Step' is the script I am trying to run. 文件“ Historical1Step”是我尝试运行的脚本。 The first lines (until the code falls over) are: 第一行(直到代码崩溃)为:
wrkDir = 'V:\Individual\SOFNN'; % this is where the files are on cluster headnode
wrkFldr = [wrkDir '\ExcelSheets\1-stepAhead\']; % location of excel sheets
%%
folder = dir(wrkFldr);
isub = [folder(:).isdir]; % data is stored in sub-directory based on stock symbol
stockNames = {folder(isub).name}'; % extract stock names from names of sub-dirs
stockNames(ismember(stockNames,{'.','..'})) = []; % remove names '.' and '..'
for i = 1:1 % this loop should read in data for stock i from correct sub-dir
close all;
clc;
sym = stockNames{i};
disp(['Analysing data for stock ' sym]);
fldrName = strcat(wrkFldr,'\', sym, '\');
end % added for completion
In your code, you're using a mapped-drive letter on the workers. 在您的代码中,您在工作进程上使用了映射驱动器号。 Typically, workers cannot see mapped-drive letters because of the way the processes are launched. 通常,由于启动进程的方式,工作人员看不到映射驱动器号。 Try using a UNC path instead. 尝试改用UNC路径。 A little more info in the documentaton here : http://www.mathworks.com/help/distcomp/troubleshooting-and-debugging.html 在此处的文档中有更多信息: http : //www.mathworks.com/help/distcomp/troubleshooting-and-debugging.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.