简体   繁体   English

使用Python中的restful service启动Spark-Submit

[英]Launch Spark-Submit with restful service in Python

following this tutorial I've made a restful service in python. 按照教程,我在python中做了一个安静的服务。 Using this service I want to call an other python script with spark-submit , but it doens't work. 使用这个服务我想用spark-submit调用另一个python脚本,但它不起作用。

Here my service.py : 这是我的service.py

import pickle
import subprocess
from flask import Flask, request
from flask_restful import Resource, Api
from json import dumps
from flask_jsonpify import jsonify

app = Flask(__name__)
api = Api(app)

class Test(Resource):
    def post(self):
        imageID = request.form.get('imageID')
        tags = request.form.get('tags')

        return subprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT)


api.add_resource(Test, '/test') 

if __name__ == '__main__':
    app.run(port=5002)

This service is made with virtualenv started using this: 这个服务是使用virtualenv开始使用这个:

source venv/bin/activate
python service.py

But when the script run ubprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT) it return me this error: 但是当脚本运行ubprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT)它会返回此错误:

Running on http://127.0.0.1:5002/ (Press CTRL+C to quit)
OpenJDK 64-Bit Server VM warning: Insufficient space for shared memory 
file:
   34475
 Try using the -Djava.io.tmpdir= option to select an alternate temp location.

 OpenJDK 64-Bit Server VM warning: Insufficient space for shared memory file:
   34462
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Traceback (most recent call last):
  File "/home/usertest/project/NaiveBayesClassifier.py", line 2, in <module>
    import numpy
ImportError: No module named numpy
127.0.0.1 - - [23/Feb/2018 14:36:06] "POST /test HTTP/1.1" 200 -

Any ideas about the problem? 关于这个问题的任何想法? I'm using Spark 1.6.1 我正在使用Spark 1.6.1

I see three problems. 我看到三个问题。

First in your code, you should use Popen in this way: 首先在你的代码中,你应该以这种方式使用Popen:

class Test(Resource):
    def post(self):
        imageID = request.form.get('imageID')
        tags = request.form.get('tags')

        p = subprocess.Popen(["spark-submit", "NaiveBayesClassifier.py"], stdout=subprocess.PIPE)
        return p.communicate()

Second in your virtualenv you should install pip 你的virtualenv中应该安装第二个 pip

pip install numpy

or using sudo if you get an errors 如果出现错误,请使用sudo

sudo pip install numpy

Third this message means that there's no more space in your HDD. 第三,此消息表示硬盘中没有更多空间。 Try to delete some large file or incrase your partition if you can. 如果可以的话,尝试删除一些大文件或者删除你的分区。

warning: Insufficient space for shared memory file: 34475

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM