简体   繁体   中英

Spark wholeTextFiles difference between shell and app

I've copy-pasted a line that looks like this

val files = sc.wholeTextFiles("file:///path/to/files/*.csv")

from the Spark shell, where it runs, to an application, where it does not run. Instead I get that the pattern matches 0 files even though in the shell I can see all the files and Spark reads them.

What am I missing? Is this a file permissions problem?

I'm running the app as follows:

spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --files /usr/hdp/current/spark/conf/hive-site.xml \
  --num-executors 20 \
  --driver-memory 8G \
  --executor-memory 4G \
  --class com.myorg.pkg.MyApp \
  MyApp-assembly-0.1.jar

In order for this to work, all of your executors need access to this file. If the file is not on the local filesystem for every executor then you will run into issues.

One option would be to place the file on hdfs and provide the path as hdfs:/path/to/file.csv . This way all of the executors have access to it.

Another option would be to pass the file in the --files parameter. This will ship the file out to all the executors so they all have access to it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM