简体   繁体   中英

How do I enable Snappy codec support in a Spark cluster launched with Google Cloud Dataproc?

When attempting to read a Snappy compressed sequence file from a Spark cluster launched with Google Cloud Dataproc , I am receiving the following warning:

java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.

What is the best way to enable Snappy codec support in this context?

Unfortunately, the launch image of Dataproc was built without Snappy support. I've opened a bug to get this fixed for the next image.

A workaround:

  1. First create a small shell script that properly installs snappy and the native library support for it. For this we'll use the same native libraries bdutil uses. I called my script setup-snappy.sh :

     #!/bin/bash pushd "$(mktemp -d)" apt-get install -q -y libsnappy1 wget https://storage.googleapis.com/hadoop-native-dist/Hadoop_2.7.1-Linux-amd64-64.tar.gz tar zxvf Hadoop_2.7.1-Linux-amd64-64.tar.gz -C /usr/lib/hadoop/ 
  2. Copy the new shell script to a GCS bucket you own. For demonstration purposes, let's assume the bucket is dataproc-actions :

     gsutil cp ./setup-snappy.sh gs://dataproc-actions/setup-snappy.sh 
  3. When starting a cluster, specify initialization actions:

     gcloud beta dataproc clusters create --initialization-actions gs://dataproc-actions/setup-snappy.sh mycluster 

I have not done this myself but this post should solve your issue:

For installing and configuring other system-level components bdutil supports an extension mechanism. A good example of extensions is the Spark extension bundled with bdutil: extensions/spark/spark_env.sh. When running bdutil extensions are added with the -e flag eg, to deploy Spark with Hadoop:

./bdutil -e extensions/spark/spark_env.sh deploy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM