I am new to Spark and using python to write jobs using pyspark. I wanted to run my script on a yarn cluster and remove the verbose logging by sending a log4j.properties
for setting logging level to WARN
using --files
tag. I have a local csv file that the script uses and i need to include this as well. How do I use --files
tag to include both the files?
I am using the following command:
/opt/spark/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 50 --executor-cores 2 --executor-memory 2G --files /opt/spark/conf/log4j.properties ./list.csv ./read_parquet.py
But I get the following error: Error: Cannot load main class from JAR file:/opt/spark/conf/./list.csv
`
You could be remove the "." infront of / for second file...Here i removed this is working.
/opt/spark/bin/spark-submit --master yarn --deploy-mode cluster --num-executors 50 --executor-cores 2 --executor-memory 2G --files /opt/spark/conf/log4j.properties /list.csv /read_parquet.py
you can send comma separated files in a string like this through file paths :
--files "filepath1,filepath2,filepath3" \\
worked for me!!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.