简体   繁体   English

如何在没有互联网的情况下在 ubuntu 上安装 R 箭头包的库?

[英]How to install libs for R arrow package on ubuntu without internet?

I am working on Azure databricks and it's compute server is Ubuntu 18.04.我正在研究 Azure 数据块,它的计算服务器是 Ubuntu 18.04。 I want to install arrow R package but without internet access because of security reasons.我想安装箭头R 包,但由于安全原因无法访问互联网 I downloaded arrow tar file on my MacBook that has internet access and made it available in ubuntu for manual installation.我在可以访问互联网的 MacBook 上下载了箭头 tar 文件,并在 ubuntu 中手动安装它。 I performed following steps:我执行了以下步骤:

  1. Re-installed build-essential by downloading it from this link and uploaded to ubuntu and executed following bash command to make it available: sudo dpkg -i /dbfs/FileStore/tables/build_essential_12_4ubuntu1_amd64.deb通过从此链接下载并上传到 ubuntu 并执行以下 bash 命令以使其可用来重新安装build-essentialsudo dpkg -i /dbfs/FileStore/tables/build_essential_12_4ubuntu1_amd64.deb
  2. Installed cpp11 as it is dependency mentioned on cran : R CMD INSTALL /dbfs/FileStore/tables/arrow_dir/cpp11_0_3_1.tar.gz安装了cpp11,因为它依赖于cranR CMD INSTALL /dbfs/FileStore/tables/arrow_dir/cpp11_0_3_1.tar.gz
  3. Downloaded arrow_4.0.1.tar.gz from here and made it available on ubuntu.这里下载了arrow_4.0.1.tar.gz并使其在 ubuntu 上可用。
  4. Here I see required C++ dependencies to be available on ubuntu before installing the arrow package.在这里,我看到在安装箭头包之前需要在 ubuntu 上可用的C++ 依赖项 How can I install these dependencies without access to internet?如何在无法访问互联网的情况下安装这些依赖项?

Thanks for reading my question.感谢您阅读我的问题。

Note: A solution is suggested below and after execution of ./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty I get:注意:下面建议了一个解决方案,在执行./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty我得到:

# Environment variables for offline Arrow build
export ARROW_ABSL_URL=/root/arrow-thirdparty/absl-0f3bb466b868b523cf1dc9b2aaaed65c77b28862.tar.gz
export ARROW_AWSSDK_URL=/root/arrow-thirdparty/aws-sdk-cpp-1.8.133.tar.gz
export ARROW_AWS_CHECKSUMS_URL=/root/arrow-thirdparty/aws-checksums-v0.1.10
export ARROW_AWS_C_COMMON_URL=/root/arrow-thirdparty/aws-c-common-v0.5.10.tar.gz
export ARROW_AWS_C_EVENT_STREAM_URL=/root/arrow-thirdparty/aws-c-event-stream-v0.1.5
export ARROW_BOOST_URL=/root/arrow-thirdparty/boost-1.75.0.tar.gz
export ARROW_BROTLI_URL=/root/arrow-thirdparty/brotli-v1.0.9.tar.gz
export ARROW_BZIP2_URL=/root/arrow-thirdparty/bzip2-1.0.8.tar.gz
export ARROW_CARES_URL=/root/arrow-thirdparty/cares-1.17.1.tar.gz
export ARROW_GBENCHMARK_URL=/root/arrow-thirdparty/gbenchmark-v1.5.2.tar.gz
export ARROW_GFLAGS_URL=/root/arrow-thirdparty/gflags-v2.2.2.tar.gz
export ARROW_GLOG_URL=/root/arrow-thirdparty/glog-v0.4.0.tar.gz
export ARROW_GRPC_URL=/root/arrow-thirdparty/grpc-v1.35.0.tar.gz
export ARROW_GTEST_URL=/root/arrow-thirdparty/gtest-1.10.0.tar.gz
export ARROW_JEMALLOC_URL=/root/arrow-thirdparty/jemalloc-5.2.1.tar.bz2
export ARROW_LZ4_URL=/root/arrow-thirdparty/lz4-v1.9.3.tar.gz
export ARROW_MIMALLOC_URL=/root/arrow-thirdparty/mimalloc-v1.7.2.tar.gz
export ARROW_ORC_URL=/root/arrow-thirdparty/orc-1.6.6.tar.gz
Failed downloading https://github.com/google/protobuf/releases/download/v3.14.0/protobuf-all-3.14.0.tar.gz

Would it help to use the script mentioned in the link below to download the dependencies and put them somewhere you can then install them from?使用下面链接中提到的脚本下载依赖项并将它们放在可以安装的地方是否有帮助?

There's some instructions here: https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds这里有一些说明: https : //arrow.apache.org/docs/developers/cpp/building.html#offline-builds

I've pasted them below in case the link expires, but you may want to check it for the most up to date version of these instructions.我已将它们粘贴在下面以防链接过期,但您可能需要查看这些说明的最新版本。

To enable offline builds, you can download the source artifacts yourself and use environment variables of the form ARROW_$LIBRARY_URL to direct the build system to read from a local file rather than accessing the internet.要启用离线构建,您可以自己下载源工件并使用 ARROW_$LIBRARY_URL 形式的环境变量来指示构建系统从本地文件读取而不是访问互联网。

To make this easier for you, we have prepared a script thirdparty/download_dependencies.sh which will download the correct version of each dependency to a directory of your choosing.为了让您更轻松,我们准备了一个脚本thirdparty/download_dependencies.sh,它将每个依赖项的正确版本下载到您选择的目录。 It will print a list of bash-style environment variable statements at the end to use for your build script.它会在最后打印一个 bash 风格的环境变量语句列表,用于您的构建脚本。

# Download tarballs into $HOME/arrow-thirdparty
$ ./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty

You can then invoke CMake to create the build directory and it will use the declared environment variable pointing to downloaded archives instead of downloading them (one for each build dir!).然后,您可以调用 CMake 来创建构建目录,它将使用指向已下载档案的声明环境变量,而不是下载它们(每个构建目录一个!)。

Starting in arrow 6.0.0, the package should successfully install from source when offline.arrow 6.0.0 开始,离线时包应该从源成功安装。 It will have only basic features: you'll be able to work with Arrow data and feather files, but features like Parquet reading, S3, and compression libraries won't be available.它将只有基本功能:您将能够使用 Arrow 数据和羽化文件,但 Parquet 读取、S3 和压缩库等功能将不可用。 There is also a new utility function, create_package_with_all_dependencies() , that you can run on a machine connected to the internet in order to produce a "fat" source package containing all third-party C++ dependencies.还有一个新的实用程序函数create_package_with_all_dependencies() ,您可以在连接到 Internet 的机器上运行该函数,以生成包含所有第三方 C++ 依赖项的“胖”源包。 You can then copy this to your airgapped server.然后,您可以将其复制到您的气隙服务器。 See https://arrow.apache.org/docs/r/reference/create_package_with_all_dependencies.html for details.有关详细信息,请参阅https://arrow.apache.org/docs/r/reference/create_package_with_all_dependencies.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM