简体   繁体   English

Nvidia TX1上的TensorFlow

[英]TensorFlow on Nvidia TX1

Anyone gotten tensorflow working on the Nvidia Tegra X1? 任何人都在使用Nvidia Tegra X1进行张量流动?

I've found a few sources indicating it's possible on the TK1 or with significant hacking/errors on the TX1 but no definitive recipe yet. 我发现一些消息来源表明它可能存在于TK1上,或者TX1上存在严重的黑客攻击/错误,但还没有确定的配方。

I am using a Jetson 2.3 install but haven't gotten it working yet - any tips most appreciated. 我正在使用Jetson 2.3安装但尚未使用它 - 任何提示最受欢迎。

Got TensorFlow R0.9 working on TX1 with Bazel 0.2.1, CUDA 8.0, CUDNN5.1, L4T24.2, and fresh JetPack 2.3 install. TensorFlow R0.9使用Bazel 0.2.1,CUDA 8.0,CUDNN5.1,L4T24.2和新安装的JetPack 2.3在TX1上运行。 I've tested it with basic MLP, Conv, and LSTM nets using BN, Sigmoid, ReLU etc with no errors yet. 我使用BN,Sigmoid,ReLU等基本的MLP,Conv和LSTM网络进行了测试,但没有任何错误。 I removed sparse_matmul_op though otherwise believe compilation should be fully operational. 我删除了sparse_matmul_op,但认为编译应该完全可操作。 Many of these steps come directly from MaxCuda's excellent blog , so huge thanks to them for providing. 其中许多步骤直接来自MaxCuda的优秀博客 ,非常感谢他们的提供。

Plan to continue hammering on R0.10/R0.11 (gRPC binary is preventing Bazel 0.3.0 right now) but until then figured I'd post the R0.9 formula. 计划继续锤击R0.10 / R0.11(gRPC二进制现在正在阻止Bazel 0.3.0),但在此之前我想发布了R0.9公式。 As below: 如下:

First get java 首先得到java

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Install some other deps 安装其他一些deps

sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev maven swig

Need to build protobuf 3.0.0-beta-2 jar yourself 需要自己构建protobuf 3.0.0-beta-2 jar

git clone https://github.com/google/protobuf.git
cd protobuf
# autogen.sh downloads broken gmock.zip in d5fb408d
git checkout master
./autogen.sh
git checkout d5fb408d
./configure --prefix=/usr
make -j 4
sudo make install
cd java
mvn package

Get bazel. 得到bazel。 We want version 0.2.1, it doesn't require gRPC binary unlike 0.3.0 which I can't build yet (maybe soon!) 我们想要版本0.2.1,它不需要gRPC二进制文件,不像0.3.0我无法构建(可能很快!)

git clone https://github.com/bazelbuild/bazel.git
cd bazel
git checkout 0.2.1
cp /usr/bin/protoc third_party/protobuf/protoc-linux-arm32.exe
cp ../protobuf/java/target/protobuf-java-3.0.0-beta-2.jar third_party/protobuf/protobuf-java-3.0.0-beta-1.jar

Need to edit a bazel file to recognize aarch64 as ARM 需要编辑bazel文件以将aarch64识别为ARM

--- a/src/main/java/com/google/devtools/build/lib/util/CPU.java
+++ b/src/main/java/com/google/devtools/build/lib/util/CPU.java
@@ -25,7 +25,7 @@ import java.util.Set;
 public enum CPU {
   X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
   X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
-  ARM("arm", ImmutableSet.of("arm", "armv7l")),
+  ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
   UNKNOWN("unknown", ImmutableSet.<String>of());

Now compile 现在编译

./compile.sh

And install 并安装

sudo cp output/bazel /usr/local/bin

Get tensorflow R0.9. 得到张量流R0.9。 Higher than R0.9 requires Bazel 0.3.0 which I haven't figured out how to build yet due to gRPC issues. 高于R0.9需要Bazel 0.3.0,由于gRPC问题,我还没有想出如何构建。

git clone -b r0.9 https://github.com/tensorflow/tensorflow.git

Build once. 建立一次。 It will fail, but now you have the bazel .cache dir where you can place updated config.guess & config.sub files that will figure what architecture you're running 它会失败,但现在你有了bazel .cache目录,你可以放置更新的config.guess和config.sub文件,它们将显示你正在运行的架构

./configure
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

cd ~
wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'

# below are commands I ran, yours will vary depending on .cache details. `find` is your friend
cp config.guess ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.guess
cp config.sub ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.sub

sparse_matmul_op had a couple errors, I took the cowardly route and removed from the build sparse_matmul_op有几个错误,我采取了懦弱的路线并从构建中删除

--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
@@ -985,7 +985,7 @@ tf_kernel_libraries(
         "reduction_ops",
         "segment_reduction_ops",
         "sequence_ops",
-        "sparse_matmul_op",
+        #DC "sparse_matmul_op",
     ],
     deps = [
         ":bounds_check",

--- a/tensorflow/python/BUILD
+++ b/tensorflow/python/BUILD
@@ -1110,7 +1110,7 @@ medium_kernel_test_list = glob([
     "kernel_tests/seq2seq_test.py",
     "kernel_tests/slice_op_test.py",
     "kernel_tests/sparse_ops_test.py",
-    "kernel_tests/sparse_matmul_op_test.py",
+    #DC "kernel_tests/sparse_matmul_op_test.py",
     "kernel_tests/sparse_tensor_dense_matmul_op_test.py",
 ])

TX1 can't do fancy constructors in cwise_op_gpu_select.cu.cc TX1无法在cwise_op_gpu_select.cu.cc中执行花哨的构造函数

--- a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
+++ b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
@@ -43,8 +43,14 @@ struct BatchSelectFunctor<GPUDevice, T> {
     const int all_but_batch = then_flat_outer_dims.dimension(1);

 #if !defined(EIGEN_HAS_INDEX_LIST)
-    Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
-    Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+    //DC Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
+    Eigen::array<int, 2> broadcast_dims;
+    broadcast_dims[0] = 1;
+    broadcast_dims[1] = all_but_batch;
+    //DC Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+    Eigen::Tensor<int, 2>::Dimensions reshape_dims;
+    reshape_dims[0] = batch;
+    reshape_dims[1] = 1;
 #else
     Eigen::IndexList<Eigen::type2index<1>, int> broadcast_dims;
     broadcast_dims.set(1, all_but_batch);

Same in sparse_tensor_dense_matmul_op_gpu.cu.cc 与sparse_tensor_dense_matmul_op_gpu.cu.cc相同

--- a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
@@ -104,9 +104,17 @@ struct SparseTensorDenseMatMulFunctor<GPUDevice, T, ADJ_A, ADJ_B> {
     int n = (ADJ_B) ? b.dimension(0) : b.dimension(1);

 #if !defined(EIGEN_HAS_INDEX_LIST)
-    Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
-    Eigen::array<int, 2> n_by_1{{ n, 1 }};
-    Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+    //DC Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
+    Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz;
+    matrix_1_by_nnz[0] = 1;
+    matrix_1_by_nnz[1] = nnz;
+    //DC Eigen::array<int, 2> n_by_1{{ n, 1 }};
+    Eigen::array<int, 2> n_by_1;
+    n_by_1[0] = n;
+    n_by_1[1] = 1;
+    //DC Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+    Eigen::array<int, 1> reduce_on_rows;
+    reduce_on_rows[0] = 0;
 #else
     Eigen::IndexList<Eigen::type2index<1>, int> matrix_1_by_nnz;
     matrix_1_by_nnz.set(1, nnz);

Running with CUDA 8.0 requires new macros for FP16. 使用CUDA 8.0运行需要FP16的新宏。 Many thanks to Kashif/Mrry for pointing out the fix! 非常感谢Kashif / Mrry指出修复!

--- a/tensorflow/stream_executor/cuda/cuda_blas.cc
+++ b/tensorflow/stream_executor/cuda/cuda_blas.cc
@@ -25,6 +25,12 @@ limitations under the License.
 #define EIGEN_HAS_CUDA_FP16
 #endif

+#if CUDA_VERSION >= 8000
+#define SE_CUDA_DATA_HALF CUDA_R_16F
+#else
+#define SE_CUDA_DATA_HALF CUBLAS_DATA_HALF
+#endif
+
 #include "tensorflow/stream_executor/cuda/cuda_blas.h"

 #include <dlfcn.h>
@@ -1680,10 +1686,10 @@ bool CUDABlas::DoBlasGemm(
   return DoBlasInternal(
       dynload::cublasSgemmEx, stream, true /* = pointer_mode_host */,
       CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k, &alpha,
-      CUDAMemory(a), CUBLAS_DATA_HALF, lda,
-      CUDAMemory(b), CUBLAS_DATA_HALF, ldb,
+      CUDAMemory(a), SE_CUDA_DATA_HALF, lda,
+      CUDAMemory(b), SE_CUDA_DATA_HALF, ldb,
       &beta,
-      CUDAMemoryMutable(c), CUBLAS_DATA_HALF, ldc);
+      CUDAMemoryMutable(c), SE_CUDA_DATA_HALF, ldc);
 #else
   LOG(ERROR) << "fp16 sgemm is not implemented in this cuBLAS version "
              << "(need at least CUDA 7.5)";

And lastly ARM has no NUMA nodes so this needs to be added or you will get an immediate crash on starting tf.Session() 最后,ARM没有NUMA节点,因此需要添加或者在启动tf.Session()时会立即崩溃

--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -888,6 +888,9 @@ CudaContext* CUDAExecutor::cuda_context() { return context_; }
 // For anything more complicated/prod-focused than this, you'll likely want to
 // turn to gsys' topology modeling.
 static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+  // DC - make this clever later. ARM has no NUMA node, just return 0
+  LOG(INFO) << "ARM has no NUMA node, hardcoding to return zero";
+  return 0;
 #if defined(__APPLE__)
   LOG(INFO) << "OS X does not support NUMA - returning NUMA node zero";
   return 0;

After these changes, build and install! 完成这些更改后,构建并安装! Hope this is useful to some folks. 希望这对一些人有用。

Follow Dwight's answer but also create a swap file of at least 6 GB 按照Dwight的回答,还要创建一个至少6 GB的交换文件

Following Dwight Crow's answer but with an 8 GB swap file and using the following command successfully built TensorFlow 0.9 on the Jetson TX1 from a fresh install of JetPack 2.3 : Dwight Crow的答案之后,使用8 GB交换文件并使用以下命令在Jetson TX1上通过全新安装的JetPack 2.3成功构建了TensorFlow 0.9:

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

I used the default settings for TensorFlow's ./configure script except to enable GPU support. 除了启用GPU支持外,我使用了TensorFlow的./configure脚本的默认设置。

My build took at least 6 hours. 我的构建至少需要6个小时。 It'll be faster if you use an SSD instead of a USB drive. 如果您使用SSD而不是USB驱动器,它会更快。

Creating a swap file 创建交换文件

# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s

I used this USB drive to store my swap file. 我使用这个USB驱动器来存储我的交换文件。

The most memory I saw my system use was 7.7 GB (3.8 GB on Mem and 3.9 GB on Swap). 我看到我的系统使用的内存最多是7.7 GB(Mem上为3.8 GB,Swap上为3.9 GB)。 The most swap memory I saw used at once was 4.4 GB. 我见过的最多交换内存是4.4 GB。 I used free -h to view memory usage. 我用free -h来查看内存使用情况。

Creating the pip package and installing 创建pip包并安装

Adapted from the TensorFlow docs : 改编自TensorFlow文档

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl

Acknowledgements 致谢

Thanks to Dwight Crow (guide), elirex (bazel option values and free -h), tylerfox (swap file idea and local_resources option), everyone that helped them, and everyone in the Github issue thread . 感谢Dwight Crow (指南), elirex (bazel选项值和free -h), tylerfox (交换文件构思和local_resources选项),帮助他们的每个人, 以及Github中的每个人发布帖子

The swap file script was adapted from JetsonHack's gist . 交换文件脚本是根据JetsonHack的要点改编

Errors I received when not using a swap file 不使用交换文件时收到的错误

To help search engines find this answer. 为了帮助搜索引擎找到这个答案。

Error: unexpected EOF from Bazel server.

gcc: internal compiler error: Killed (program cc1plus)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM