[英]Launch Spark-Submit with restful service in Python
following this tutorial I've made a restful service in python. 按照本教程,我在python中做了一个安静的服务。 Using this service I want to call an other python script with
spark-submit
, but it doens't work. 使用这个服务我想用
spark-submit
调用另一个python脚本,但它不起作用。
Here my service.py : 这是我的service.py :
import pickle
import subprocess
from flask import Flask, request
from flask_restful import Resource, Api
from json import dumps
from flask_jsonpify import jsonify
app = Flask(__name__)
api = Api(app)
class Test(Resource):
def post(self):
imageID = request.form.get('imageID')
tags = request.form.get('tags')
return subprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT)
api.add_resource(Test, '/test')
if __name__ == '__main__':
app.run(port=5002)
This service is made with virtualenv started using this: 这个服务是使用virtualenv开始使用这个:
source venv/bin/activate
python service.py
But when the script run ubprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT)
it return me this error: 但是当脚本运行
ubprocess.call("spark-submit NaiveBayesClassifier.py",shell=True,stderr=subprocess.STDOUT)
它会返回此错误:
Running on http://127.0.0.1:5002/ (Press CTRL+C to quit)
OpenJDK 64-Bit Server VM warning: Insufficient space for shared memory
file:
34475
Try using the -Djava.io.tmpdir= option to select an alternate temp location.
OpenJDK 64-Bit Server VM warning: Insufficient space for shared memory file:
34462
Try using the -Djava.io.tmpdir= option to select an alternate temp location.
Traceback (most recent call last):
File "/home/usertest/project/NaiveBayesClassifier.py", line 2, in <module>
import numpy
ImportError: No module named numpy
127.0.0.1 - - [23/Feb/2018 14:36:06] "POST /test HTTP/1.1" 200 -
Any ideas about the problem? 关于这个问题的任何想法? I'm using Spark 1.6.1
我正在使用Spark 1.6.1
I see three problems. 我看到三个问题。
First in your code, you should use Popen in this way: 首先在你的代码中,你应该以这种方式使用Popen:
class Test(Resource):
def post(self):
imageID = request.form.get('imageID')
tags = request.form.get('tags')
p = subprocess.Popen(["spark-submit", "NaiveBayesClassifier.py"], stdout=subprocess.PIPE)
return p.communicate()
Second in your virtualenv you should install pip 你的virtualenv中应该安装第二个 pip
pip install numpy
or using sudo if you get an errors 如果出现错误,请使用sudo
sudo pip install numpy
Third this message means that there's no more space in your HDD. 第三,此消息表示硬盘中没有更多空间。 Try to delete some large file or incrase your partition if you can.
如果可以的话,尝试删除一些大文件或者删除你的分区。
warning: Insufficient space for shared memory file: 34475
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.