繁体   English   中英

无法从终端读取带有“火花提交”的文件

[英]Can't read file with “spark-submit” from terminal

我正在尝试使用spark-submit file.py从终端运行.py文件,但是它不起作用。 但是,如果我使用python file.py读取它,它将起作用。

这是错误:

 2018-11-08 17:06:51 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2018-11-08 17:06:52 INFO SparkContext:54 - Running Spark version 2.3.1 2018-11-08 17:06:52 INFO SparkContext:54 - Submitted application: hw3 2018-11-08 17:06:52 INFO SecurityManager:54 - Changing view acls to: dummy 2018-11-08 17:06:52 INFO SecurityManager:54 - Changing modify acls to: dummy 2018-11-08 17:06:52 INFO SecurityManager:54 - Changing view acls groups to: 2018-11-08 17:06:52 INFO SecurityManager:54 - Changing modify acls groups to: 2018-11-08 17:06:52 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(vivianamarquez); groups with view permissions: Set(); users with modify permissions: Set(vivianamarquez); groups with modify permissions: Set() 2018-11-08 17:06:52 INFO Utils:54 - Successfully started service 'sparkDriver' on port 57575. 2018-11-08 17:06:52 INFO SparkEnv:54 - Registering MapOutputTracker 2018-11-08 17:06:52 INFO SparkEnv:54 - Registering BlockManagerMaster 2018-11-08 17:06:52 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2018-11-08 17:06:52 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2018-11-08 17:06:52 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/blockmgr-bc531d91-4ca0-4c93-afc2-5cf5c3389b86 2018-11-08 17:06:52 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB 2018-11-08 17:06:52 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2018-11-08 17:06:52 INFO log:192 - Logging initialized @1912ms 2018-11-08 17:06:52 INFO Server:346 - jetty-9.3.z-SNAPSHOT 2018-11-08 17:06:52 INFO Server:414 - Started @1978ms 2018-11-08 17:06:52 INFO AbstractConnector:278 - Started ServerConnector@7f04b8eb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-11-08 17:06:52 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@4871d3cc{/jobs,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@3697e88c{/jobs/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@23ff21a8{/jobs/job,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@20c20340{/jobs/job/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@29985c5c{/stages,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@7330daa6{/stages/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@5febd2c2{/stages/stage,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@7182c6b2{/stages/stage/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@70fe7782{/stages/pool,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@7998b03{/stages/pool/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@1552fba5{/storage,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@728208eb{/storage/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@7143335e{/storage/rdd,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@3a496fe6{/storage/rdd/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@38c424d9{/environment,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@5ae3a67a{/environment/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@3252b7bb{/executors,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@4395d848{/executors/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@5aeece0f{/executors/threadDump,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@1d79635e{/executors/threadDump/json,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@3a31e025{/static,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@4d098d91{/,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@680392d9{/api,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@2bae8a18{/jobs/job/kill,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO ContextHandler:781 - Started osjsServletContextHandler@7e5f6ce6{/stages/stage/kill,null,AVAILABLE,@Spark} 2018-11-08 17:06:52 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.1.152.221:4040 2018-11-08 17:06:52 ERROR SparkContext:91 - Error initializing SparkContext. java.io.FileNotFoundException: File file:/Users/dummy/Desktop/hw.py does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.SparkContext.<init>(SparkContext.scala:461) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 2018-11-08 17:06:52 INFO AbstractConnector:318 - Stopped Spark@7f04b8eb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2018-11-08 17:06:52 INFO SparkUI:54 - Stopped Spark web UI at http://10.1.152.221:4040 2018-11-08 17:06:52 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped! 2018-11-08 17:06:52 INFO MemoryStore:54 - MemoryStore cleared 2018-11-08 17:06:52 INFO BlockManager:54 - BlockManager stopped 2018-11-08 17:06:52 INFO BlockManagerMaster:54 - BlockManagerMaster stopped 2018-11-08 17:06:52 WARN MetricsSystem:66 - Stopping a MetricsSystem that is not running 2018-11-08 17:06:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped! 2018-11-08 17:06:52 INFO SparkContext:54 - Successfully stopped SparkContext Traceback (most recent call last): File "/Users/dummy/Desktop/hw.py", line 6, in <module> sc = SparkContext(appName=app_name); File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__ File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py", line 282, in _initialize_context File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__ File "/anaconda3/envs/ms69/lib/python3.6/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:/Users/dummy/Desktop/hw.py does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1529) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1499) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:461) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.SparkContext.<init>(SparkContext.scala:461) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) 2018-11-08 17:06:52 INFO ShutdownHookManager:54 - Shutdown hook called 2018-11-08 17:06:52 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/spark-36742eed-5188-4642-a9db-29cb8efd0514 2018-11-08 17:06:52 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/n7/q93jwpcs6jndz6qqvj4mhtcm0000gn/T/spark-1b0c4122-4c22-46ba-840d-b1326bc0e840 

为什么会这样呢? 帮助将不胜感激!

您需要将所有相关的外部文件添加到作业中,否则执行程序容器将找不到它们(除非您从hdfs中读取了它们)。 可以使用--files添加它

spark-submit --files hw.py file.py

但是,使用--py-files会将其添加到容器PYTHONPATH 所以你可能更喜欢

spark-submit --py-files hw.py file.py

当您使用python运行它时,驱动程序和执行程序是相同的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM