[英]Getting Segmentation fault while running tornado app on multiple core
I am running tornado app as given below 我正在运行龙卷风应用程序,如下所示
app = make_app()
server = tornado.httpserver.HTTPServer(app)
server.bind(8888)
server.start(0) # autodetect number of cores and fork a process for each
print("server started at port 8888")
tornado.ioloop.IOLoop.instance().start()
this successfully starts app on available cores . 这将在可用内核上成功启动应用程序。 and this is the piece of code which is running on an api call 这是在api调用上运行的代码
ctx = mx.cpu(0)
_, arg_params, aux_params = mx.model.load_checkpoint(args.prefix, args.epoch)
arg_params, aux_params = ch_dev(arg_params, aux_params, ctx)
sym = resnet_50(num_class=2)
arg_params["data"] = mx.nd.array(img, ctx)
arg_params["im_info"] = mx.nd.array(im_info, ctx)
exe = sym.bind(ctx, arg_params, args_grad=None, grad_req="null", aux_states=aux_params)
print("detect 4")
tic = time.time()
print("detect 5")
exe.forward(is_train=False)
print("detect 6")
output_dict = {name: nd for name, nd in zip(sym.list_outputs(), exe.outputs)}
rois = output_dict['rpn_rois_output'].asnumpy()[:, 1:]
when running tornado app on single core it works fine, but on multi core this runs till the last line of above code, after that i am getting this error 当在单核上运行龙卷风应用程序时,它工作正常,但在多核上运行直到上述代码的最后一行,之后我收到此错误
Segmentation fault: 11
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x0000000116ef741f _ .
ZN5mxnet15segfault_loggerEi + 63
[bt] (1) 1 libsystem_platform.dylib 0x00007fff6ce4af5a _sigtramp
+ 26
[bt] (2) 2 libsystem_malloc.dylib 0x00007fff6cd73cc0
malloc_zone_calloc + 87
[bt] (3) 3 CarbonCore 0x00007fff46798117
_ZL22connectToCoreServicesDv + 258
[bt] (4) 4 CarbonCore 0x00007fff46797fe4
_ZL9getStatusv + 24
[bt] (5) 5 CarbonCore 0x00007fff46797f62
scCreateSystemServiceVersion + 49
[bt] (6) 6 CarbonCore 0x00007fff46799392
FileIDTreeGetCachedPort + 213
[bt] (7) 7 CarbonCore 0x00007fff467991f2
FSNodeStorageGetAndLockCurrentUniverse + 79
[bt] (8) 8 CarbonCore 0x00007fff46799080
FileIDTreeGetAndLockVolumeEntryForDeviceID + 38
[bt] (9) 9 CarbonCore 0x00007fff46798fdd
_ZN7FSMountC2Ej17FSMountNumberTypePiPKj + 75
child 3 (pid 42579) exited with status 255, restarting
I encountered similar problem, when I was using mxnet with multiprocessing and OpenCV. 当我将mxnet与多处理和OpenCV一起使用时,我遇到了类似的问题。 I didn't use Tornado, but symptoms were same: a single process environment worked fine, but as soon as I set multiprocessing, I get segmentation faults. 我没有使用Tornado,但是症状是相同的:单个流程环境运行良好,但是一旦设置了多处理,就会出现分段错误。
It turns out that my problem was related to this issue: https://github.com/opencv/opencv/issues/5150 , and I fixed that by setting cv2.setNumThread(0)
in the beginning of my code. 事实证明,我的问题与以下问题有关: https : //github.com/opencv/opencv/issues/5150 ,我通过在代码开头设置cv2.setNumThread(0)
来解决此问题。 Since you are using resnet, I assume that you also have a dependency on OpenCV. 由于您使用的是resnet,因此我假设您还对OpenCV有依赖性。
I also notice that there were quite a few Segmentation fault issues fixed in mxnet version 1.1, so if you are not using this version, I recommend to upgrade to it as it is much more stable. 我还注意到mxnet 1.1版中修复了很多分段错误问题,因此,如果您不使用此版本,我建议对其进行升级,因为它更加稳定。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.