在多核上运行龙卷风应用程序时遇到分段错误

Question

I am running tornado app as given below 我正在运行龙卷风应用程序，如下所示

app = make_app()
server = tornado.httpserver.HTTPServer(app)
server.bind(8888)
server.start(0)  # autodetect number of cores and fork a process for each
print("server started at port 8888")
tornado.ioloop.IOLoop.instance().start()

this successfully starts app on available cores . 这将在可用内核上成功启动应用程序。 and this is the piece of code which is running on an api call 这是在api调用上运行的代码

  ctx = mx.cpu(0)
 _, arg_params, aux_params = mx.model.load_checkpoint(args.prefix, args.epoch)
arg_params, aux_params = ch_dev(arg_params, aux_params, ctx)
sym = resnet_50(num_class=2)
arg_params["data"] = mx.nd.array(img, ctx)
arg_params["im_info"] = mx.nd.array(im_info, ctx)
exe = sym.bind(ctx, arg_params, args_grad=None, grad_req="null", aux_states=aux_params)
print("detect 4")
tic = time.time()
print("detect 5")
exe.forward(is_train=False)
print("detect 6")
output_dict = {name: nd for name, nd in zip(sym.list_outputs(), exe.outputs)}
rois = output_dict['rpn_rois_output'].asnumpy()[:, 1:]

when running tornado app on single core it works fine, but on multi core this runs till the last line of above code, after that i am getting this error 当在单核上运行龙卷风应用程序时，它工作正常，但在多核上运行直到上述代码的最后一行，之后我收到此错误

Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x0000000116ef741f _ . 
ZN5mxnet15segfault_loggerEi + 63
[bt] (1) 1   libsystem_platform.dylib            0x00007fff6ce4af5a _sigtramp 
+ 26
[bt] (2) 2   libsystem_malloc.dylib              0x00007fff6cd73cc0 
malloc_zone_calloc + 87
[bt] (3) 3   CarbonCore                          0x00007fff46798117 
_ZL22connectToCoreServicesDv + 258
[bt] (4) 4   CarbonCore                          0x00007fff46797fe4 
_ZL9getStatusv + 24
[bt] (5) 5   CarbonCore                          0x00007fff46797f62 
scCreateSystemServiceVersion + 49
[bt] (6) 6   CarbonCore                          0x00007fff46799392 
FileIDTreeGetCachedPort + 213
[bt] (7) 7   CarbonCore                          0x00007fff467991f2 
FSNodeStorageGetAndLockCurrentUniverse + 79
[bt] (8) 8   CarbonCore                          0x00007fff46799080 
FileIDTreeGetAndLockVolumeEntryForDeviceID + 38
[bt] (9) 9   CarbonCore                          0x00007fff46798fdd 
_ZN7FSMountC2Ej17FSMountNumberTypePiPKj + 75
child 3 (pid 42579) exited with status 255, restarting

Answer 1

I encountered similar problem, when I was using mxnet with multiprocessing and OpenCV. 当我将mxnet与多处理和OpenCV一起使用时，我遇到了类似的问题。 I didn't use Tornado, but symptoms were same: a single process environment worked fine, but as soon as I set multiprocessing, I get segmentation faults. 我没有使用Tornado，但是症状是相同的：单个流程环境运行良好，但是一旦设置了多处理，就会出现分段错误。

It turns out that my problem was related to this issue: https://github.com/opencv/opencv/issues/5150 , and I fixed that by setting cv2.setNumThread(0) in the beginning of my code. 事实证明，我的问题与以下问题有关： https : //github.com/opencv/opencv/issues/5150 ，我通过在代码开头设置cv2.setNumThread(0)来解决此问题。 Since you are using resnet, I assume that you also have a dependency on OpenCV. 由于您使用的是resnet，因此我假设您还对OpenCV有依赖性。

I also notice that there were quite a few Segmentation fault issues fixed in mxnet version 1.1, so if you are not using this version, I recommend to upgrade to it as it is much more stable. 我还注意到mxnet 1.1版中修复了很多分段错误问题，因此，如果您不使用此版本，我建议对其进行升级，因为它更加稳定。

在多核上运行龙卷风应用程序时遇到分段错误

问题描述

1 个解决方案

解决方案1
1 2018-04-12 21:34:56

在多核上运行龙卷风应用程序时遇到分段错误

问题描述

1 个解决方案

解决方案1 1 2018-04-12 21:34:56

解决方案1
1 2018-04-12 21:34:56