简体   繁体   中英

Pyspark: How to use --files tag for multiple files while running job on Yarn cluster

I am new to Spark and using python to write jobs using pyspark. I wanted to run my script on a yarn cluster and remove the verbose logging by sending a log4j.properties for setting logging level to WARN using --files tag. I have a local csv file that the script uses and i need to include this as well. How do I use --files tag to include both the files?

I am using the following command:

/opt/spark/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 50 --executor-cores 2 --executor-memory 2G --files /opt/spark/conf/log4j.properties ./list.csv ./read_parquet.py

But I get the following error: Error: Cannot load main class from JAR file:/opt/spark/conf/./list.csv `

You could be remove the "." infront of / for second file...Here i removed this is working.

/opt/spark/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 50 --executor-cores 2 --executor-memory 2G --files /opt/spark/conf/log4j.properties /list.csv  /read_parquet.py

you can send comma separated files in a string like this through file paths :

--files "filepath1,filepath2,filepath3" \\

worked for me!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM