简体   繁体   English

AWS EMR 上的 python 3 和 tensorflow

[英]python 3 and tensorflow on AWS EMR

For EMR AWS has tensorflow 1.9 as part of the software stack for release 5.17.对于 EMR,AWS 将 tensorflow 1.9 作为 5.17 版软件堆栈的一部分。 I have my own bootstrap script to install python 3.6 and tensorflow 1.9, I took out the tensorflow installation - but it didn't work -- I get on the master node, run python3 , I get into my new python 3.6 but there is no tensorflow in my installation.我有自己的引导脚本来安装 python 3.6 和 tensorflow 1.9,我取出了 tensorflow 安装 - 但它没有用 - 我进入主节点,运行python3 ,我进入我的新 python 3.6 但没有我的安装中的张量流。 I must have installed a new python, I guess my question is how do I use the native python3 installation with tensorflow on AWS EMR, with spark - and am I lucky enough for that python3 to by 3.6?我一定已经安装了一个新的 python,我想我的问题是如何在 AWS EMR 上使用带有 tensorflow 的本机 python3 安装,以及 spark - 我是否足够幸运让 python3 达到 3.6?

This is my bootstrap script:这是我的引导脚本:

#!/usr/bin/env bash

sudo yum -y upgrade
sudo yum -y install git autoconf automake libevent-devel python36.x86_64 python36-pip.noarch python36-devel.x86_64

sudo python36 -m pip install --upgrade pip

sudo python36 -m pip install --upgrade wheel cython

sudo python36 -m pip install py4j jupyter ipython pandas scipy pyyaml scikit-learn ipykernel matplotlib seaborn h5py configobj ujson

echo -e "\n\n" >> ~/.bashrc
echo 'export PYSPARK_PYTHON=/usr/bin/python36' >> ~/.bashrc

I guess I'll try it without a bootstrap script, maybe it will just work?我想我会在没有引导脚本的情况下尝试它,也许它会起作用?

From AWS :AWS

Amazon EMR release versions 4.6.0-5.19.0: Python 3.4 is installed on the cluster instances. Amazon EMR 发布版本 4.6.0-5.19.0:Python 3.4 安装在集群实例上。 Python 2.7 is the system default. Python 2.7 是系统默认值。

Amazon EMR release versions 5.20.0 and later: Python 3.6 is installed on the cluster instances. Amazon EMR 版本 5.20.0 及更高版本:Python 3.6 安装在集群实例上。 Python 2.7 is the system default. Python 2.7 是系统默认值。

You can install python3.6 using the following bootstrap action.您可以使用以下引导操作安装 python3.6。 Also, I recommend using virtualenv to run your python scripts and keeping the required libraries info on s3 using requirements.txt file.此外,我建议使用 virtualenv 运行您的 python 脚本,并使用requirements.txt文件在 s3 上保留所需的库信息。

#!/bin/bash -xe
sudo yum install -y python36 python36-devel postgresql-devel unixODBC-devel # For pyodbc, psycopg2
virtualenv --system-site-packages /home/hadoop/workspace -p /usr/bin/python3.6 # Install virualenv
source /home/hadoop/workspace/bin/activate
aws s3 cp s3://<bucket>/requirements.txt /home/hadoop/ # Keep your required pip freeze info (tensorflow, etc...) on s3
pip install -r /home/hadoop/requirements.txt # Install your packages
# Run your scripts during main execcution using /home/hadoop/workspace/bin/python3

It looks like right now (sept 2018) AWS EMR is at python 3.4.看起来现在(2018 年 9 月)AWS EMR 是 Python 3.4。 Without a bootstrap script, I can run python3 and get their tensorflow.没有引导脚本,我可以运行python3并获取它们的 tensorflow。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM