[英]TensorFlow on Nvidia TX1
任何人都在使用Nvidia Tegra X1進行張量流動?
我發現一些消息來源表明它可能存在於TK1上,或者TX1上存在嚴重的黑客攻擊/錯誤,但還沒有確定的配方。
我正在使用Jetson 2.3安裝但尚未使用它 - 任何提示最受歡迎。
TensorFlow R0.9使用Bazel 0.2.1,CUDA 8.0,CUDNN5.1,L4T24.2和新安裝的JetPack 2.3在TX1上運行。 我使用BN,Sigmoid,ReLU等基本的MLP,Conv和LSTM網絡進行了測試,但沒有任何錯誤。 我刪除了sparse_matmul_op,但認為編譯應該完全可操作。 其中許多步驟直接來自MaxCuda的優秀博客 ,非常感謝他們的提供。
計划繼續錘擊R0.10 / R0.11(gRPC二進制現在正在阻止Bazel 0.3.0),但在此之前我想發布了R0.9公式。 如下:
首先得到java
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
安裝其他一些deps
sudo apt-get install git zip unzip autoconf automake libtool curl zlib1g-dev maven swig
需要自己構建protobuf 3.0.0-beta-2 jar
git clone https://github.com/google/protobuf.git
cd protobuf
# autogen.sh downloads broken gmock.zip in d5fb408d
git checkout master
./autogen.sh
git checkout d5fb408d
./configure --prefix=/usr
make -j 4
sudo make install
cd java
mvn package
得到bazel。 我們想要版本0.2.1,它不需要gRPC二進制文件,不像0.3.0我無法構建(可能很快!)
git clone https://github.com/bazelbuild/bazel.git
cd bazel
git checkout 0.2.1
cp /usr/bin/protoc third_party/protobuf/protoc-linux-arm32.exe
cp ../protobuf/java/target/protobuf-java-3.0.0-beta-2.jar third_party/protobuf/protobuf-java-3.0.0-beta-1.jar
需要編輯bazel文件以將aarch64識別為ARM
--- a/src/main/java/com/google/devtools/build/lib/util/CPU.java
+++ b/src/main/java/com/google/devtools/build/lib/util/CPU.java
@@ -25,7 +25,7 @@ import java.util.Set;
public enum CPU {
X86_32("x86_32", ImmutableSet.of("i386", "i486", "i586", "i686", "i786", "x86")),
X86_64("x86_64", ImmutableSet.of("amd64", "x86_64", "x64")),
- ARM("arm", ImmutableSet.of("arm", "armv7l")),
+ ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64")),
UNKNOWN("unknown", ImmutableSet.<String>of());
現在編譯
./compile.sh
並安裝
sudo cp output/bazel /usr/local/bin
得到張量流R0.9。 高於R0.9需要Bazel 0.3.0,由於gRPC問題,我還沒有想出如何構建。
git clone -b r0.9 https://github.com/tensorflow/tensorflow.git
建立一次。 它會失敗,但現在你有了bazel .cache目錄,你可以放置更新的config.guess和config.sub文件,它們將顯示你正在運行的架構
./configure
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
cd ~
wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
# below are commands I ran, yours will vary depending on .cache details. `find` is your friend
cp config.guess ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.guess
cp config.sub ./.cache/bazel/_bazel_socialh/742c01ff0765b098544431b60b1eed9f/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.sub
sparse_matmul_op有幾個錯誤,我采取了懦弱的路線並從構建中刪除
--- a/tensorflow/core/kernels/BUILD
+++ b/tensorflow/core/kernels/BUILD
@@ -985,7 +985,7 @@ tf_kernel_libraries(
"reduction_ops",
"segment_reduction_ops",
"sequence_ops",
- "sparse_matmul_op",
+ #DC "sparse_matmul_op",
],
deps = [
":bounds_check",
--- a/tensorflow/python/BUILD
+++ b/tensorflow/python/BUILD
@@ -1110,7 +1110,7 @@ medium_kernel_test_list = glob([
"kernel_tests/seq2seq_test.py",
"kernel_tests/slice_op_test.py",
"kernel_tests/sparse_ops_test.py",
- "kernel_tests/sparse_matmul_op_test.py",
+ #DC "kernel_tests/sparse_matmul_op_test.py",
"kernel_tests/sparse_tensor_dense_matmul_op_test.py",
])
TX1無法在cwise_op_gpu_select.cu.cc中執行花哨的構造函數
--- a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
+++ b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
@@ -43,8 +43,14 @@ struct BatchSelectFunctor<GPUDevice, T> {
const int all_but_batch = then_flat_outer_dims.dimension(1);
#if !defined(EIGEN_HAS_INDEX_LIST)
- Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
- Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+ //DC Eigen::array<int, 2> broadcast_dims{{ 1, all_but_batch }};
+ Eigen::array<int, 2> broadcast_dims;
+ broadcast_dims[0] = 1;
+ broadcast_dims[1] = all_but_batch;
+ //DC Eigen::Tensor<int, 2>::Dimensions reshape_dims{{ batch, 1 }};
+ Eigen::Tensor<int, 2>::Dimensions reshape_dims;
+ reshape_dims[0] = batch;
+ reshape_dims[1] = 1;
#else
Eigen::IndexList<Eigen::type2index<1>, int> broadcast_dims;
broadcast_dims.set(1, all_but_batch);
與sparse_tensor_dense_matmul_op_gpu.cu.cc相同
--- a/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
+++ b/tensorflow/core/kernels/sparse_tensor_dense_matmul_op_gpu.cu.cc
@@ -104,9 +104,17 @@ struct SparseTensorDenseMatMulFunctor<GPUDevice, T, ADJ_A, ADJ_B> {
int n = (ADJ_B) ? b.dimension(0) : b.dimension(1);
#if !defined(EIGEN_HAS_INDEX_LIST)
- Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
- Eigen::array<int, 2> n_by_1{{ n, 1 }};
- Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+ //DC Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz{{ 1, nnz }};
+ Eigen::Tensor<int, 2>::Dimensions matrix_1_by_nnz;
+ matrix_1_by_nnz[0] = 1;
+ matrix_1_by_nnz[1] = nnz;
+ //DC Eigen::array<int, 2> n_by_1{{ n, 1 }};
+ Eigen::array<int, 2> n_by_1;
+ n_by_1[0] = n;
+ n_by_1[1] = 1;
+ //DC Eigen::array<int, 1> reduce_on_rows{{ 0 }};
+ Eigen::array<int, 1> reduce_on_rows;
+ reduce_on_rows[0] = 0;
#else
Eigen::IndexList<Eigen::type2index<1>, int> matrix_1_by_nnz;
matrix_1_by_nnz.set(1, nnz);
使用CUDA 8.0運行需要FP16的新宏。 非常感謝Kashif / Mrry指出修復!
--- a/tensorflow/stream_executor/cuda/cuda_blas.cc
+++ b/tensorflow/stream_executor/cuda/cuda_blas.cc
@@ -25,6 +25,12 @@ limitations under the License.
#define EIGEN_HAS_CUDA_FP16
#endif
+#if CUDA_VERSION >= 8000
+#define SE_CUDA_DATA_HALF CUDA_R_16F
+#else
+#define SE_CUDA_DATA_HALF CUBLAS_DATA_HALF
+#endif
+
#include "tensorflow/stream_executor/cuda/cuda_blas.h"
#include <dlfcn.h>
@@ -1680,10 +1686,10 @@ bool CUDABlas::DoBlasGemm(
return DoBlasInternal(
dynload::cublasSgemmEx, stream, true /* = pointer_mode_host */,
CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k, &alpha,
- CUDAMemory(a), CUBLAS_DATA_HALF, lda,
- CUDAMemory(b), CUBLAS_DATA_HALF, ldb,
+ CUDAMemory(a), SE_CUDA_DATA_HALF, lda,
+ CUDAMemory(b), SE_CUDA_DATA_HALF, ldb,
&beta,
- CUDAMemoryMutable(c), CUBLAS_DATA_HALF, ldc);
+ CUDAMemoryMutable(c), SE_CUDA_DATA_HALF, ldc);
#else
LOG(ERROR) << "fp16 sgemm is not implemented in this cuBLAS version "
<< "(need at least CUDA 7.5)";
最后,ARM沒有NUMA節點,因此需要添加或者在啟動tf.Session()時會立即崩潰
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -888,6 +888,9 @@ CudaContext* CUDAExecutor::cuda_context() { return context_; }
// For anything more complicated/prod-focused than this, you'll likely want to
// turn to gsys' topology modeling.
static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+ // DC - make this clever later. ARM has no NUMA node, just return 0
+ LOG(INFO) << "ARM has no NUMA node, hardcoding to return zero";
+ return 0;
#if defined(__APPLE__)
LOG(INFO) << "OS X does not support NUMA - returning NUMA node zero";
return 0;
完成這些更改后,構建並安裝! 希望這對一些人有用。
繼Dwight Crow的答案之后,使用8 GB交換文件並使用以下命令在Jetson TX1上通過全新安裝的JetPack 2.3成功構建了TensorFlow 0.9:
bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package
除了啟用GPU支持外,我使用了TensorFlow的./configure
腳本的默認設置。
我的構建至少需要6個小時。 如果您使用SSD而不是USB驅動器,它會更快。
# Create a swapfile for Ubuntu at the current directory location
fallocate -l *G swapfile
# List out the file
ls -lh swapfile
# Change permissions so that only root can use it
chmod 600 swapfile
# List out the file
ls -lh swapfile
# Set up the Linux swap area
mkswap swapfile
# Now start using the swapfile
sudo swapon swapfile
# Show that it's now being used
swapon -s
我使用這個USB驅動器來存儲我的交換文件。
我看到我的系統使用的內存最多是7.7 GB(Mem上為3.8 GB,Swap上為3.9 GB)。 我見過的最多交換內存是4.4 GB。 我用free -h
來查看內存使用情況。
改編自TensorFlow文檔 :
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl
感謝Dwight Crow (指南), elirex (bazel選項值和free -h), tylerfox (交換文件構思和local_resources選項),幫助他們的每個人, 以及Github中的每個人發布帖子 。
交換文件腳本是根據JetsonHack的要點改編的 。
為了幫助搜索引擎找到這個答案。
Error: unexpected EOF from Bazel server.
gcc: internal compiler error: Killed (program cc1plus)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.