简体   繁体   中英

Spark installation for production, pip install or not?

I would like to install Pyspark 2.4.4. I have seen that I can download the Spark package or use pip install. I only need Pyspark, are they the same with both installations?

you could do python pip install pyspark but it doesn't come with Hadoop binaries which is necessary for the spark to function properly.

The easiest way to install is by using python findspark

download .tgz file from the spark website which comes with Hadoop binaries

pip install findspark

In Python:

import findspark

finspark.init('\path\to\extracted\binaries\folder')

import pyspark

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM