[英]java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7
I'm not able to run a simple spark
job in Scala IDE
(Maven spark project) installed on Windows 7
我无法在安装在 Windows 上的
Scala IDE
(Maven spark 项目)中运行简单的spark
作业Windows 7
Spark core dependency has been added.添加了 Spark 核心依赖项。
val conf = new SparkConf().setAppName("DemoDF").setMaster("local")
val sc = new SparkContext(conf)
val logData = sc.textFile("File.txt")
logData.count()
Error:错误:
16/02/26 18:29:33 INFO SparkContext: Created broadcast 0 from textFile at FrameDemo.scala:13
16/02/26 18:29:34 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
at <br>org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at <br>org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)<br>
at scala.Option.map(Option.scala:145)<br>
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)<br>
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:195)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)<br>
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)<br>
at scala.Option.getOrElse(Option.scala:120)<br>
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)<br>
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)<br>
at org.apache.spark.rdd.RDD.count(RDD.scala:1143)<br>
at com.org.SparkDF.FrameDemo$.main(FrameDemo.scala:14)<br>
at com.org.SparkDF.FrameDemo.main(FrameDemo.scala)<br>
Here is a good explanation of your problem with the solution. 以下是对解决方案问题的一个很好的解释。
SetUp your HADOOP_HOME environment variable on the OS level or programmatically: 在操作系统级别或以编程方式设置您的HADOOP_HOME环境变量:
System.setProperty("hadoop.home.dir", "full path to the folder with winutils"); System.setProperty(“hadoop.home.dir”,“winutils文件夹的完整路径”);
Enjoy 请享用
C:\\winutils\\bin
C:\\winutils\\bin
winutils.exe
inside C:\\winutils\\bin
C:\\winutils\\bin
复制winutils.exe
HADOOP_HOME
to C:\\winutils
HADOOP_HOME
设置为C:\\winutils
Follow this: 按照这个:
Create a bin
folder in any directory(to be used in step 3). 在任何目录中创建
bin
文件夹(将在步骤3中使用)。
Download winutils.exe and place it in the bin directory. 下载winutils.exe并将其放在bin目录中。
Now add System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
现在添加
System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR");
in your code. 在你的代码中。
if we see below issue 如果我们看到下面的问题
ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
错误Shell:无法在hadoop二进制路径中找到winutils二进制文件
java.io.IOException: Could not locate executable null\\bin\\winutils.exe in the Hadoop binaries.
java.io.IOException:找不到Hadoop二进制文件中的可执行文件null \\ bin \\ winutils.exe。
then do following steps 然后执行以下步骤
On Windows 10 - you should add two different arguments. 在Windows 10上 - 您应该添加两个不同的参数。
(1) Add the new variable and value as - HADOOP_HOME and path (ie c:\\Hadoop) under System Variables. (1)在系统变量下添加新变量和值为 - HADOOP_HOME和路径(即c:\\ Hadoop)。
(2) Add/append new entry to the "Path" variable as "C:\\Hadoop\\bin". (2)在“Path”变量中添加/追加新条目为“C:\\ Hadoop \\ bin”。
The above worked for me. 以上对我有用。
Setting the Hadoop_Home environment variable in system properties didn't work for me. 在系统属性中设置Hadoop_Home环境变量对我来说不起作用。 But this did:
但这样做:
I got the same problem while running unit tests. 我在运行单元测试时遇到了同样的问题。 I found this workaround solution:
我找到了这个解决方案:
The following workaround allows to get rid of this message: 以下解决方法允许删除此消息:
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
new File("./bin/winutils.exe").createNewFile();
from: https://issues.cloudera.org/browse/DISTRO-544 来自: https : //issues.cloudera.org/browse/DISTRO-544
You can alternatively download winutils.exe
from GITHub: 您也可以下载
winutils.exe
从GitHub:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
replace hadoop-2.7.1
with the version you want and place the file in D:\\hadoop\\bin
将
hadoop-2.7.1
替换为您想要的版本,并将文件放在D:\\hadoop\\bin
If you do not have access rights to the environment variable settings on your machine, simply add the below line to your code:
如果您对计算机上的环境变量设置没有访问权限,只需将以下行添加到您的代码中:
System.setProperty("hadoop.home.dir", "D:\\hadoop");
1) Download winutils.exe from https://github.com/steveloughran/winutils
2) Create a directory In windows "C:\winutils\bin
3) Copy the winutils.exe inside the above bib folder .
4) Set the environmental property in the code
System.setProperty("hadoop.home.dir", "file:///C:/winutils/");
5) Create a folder "file:///C:/temp" and give 777 permissions.
6) Add config property in spark Session ".config("spark.sql.warehouse.dir", "file:///C:/temp")"
On top of mentioning your environment variable for HADOOP_HOME
in windows as C:\\winutils
, you also need to make sure you are the administrator of the machine. 除了在Windows中将
HADOOP_HOME
的环境变量称为C:\\winutils
,还需要确保您是该计算机的管理员。 If not and adding environment variables prompts you for admin credentials (even under USER
variables) then these variables will be applicable once you start your command prompt as administrator. 如果没有,并且添加环境变量会提示您输入管理员凭据(即使在
USER
变量下),那么在您以管理员身份启动命令提示符后,这些变量将适用。
I have also faced the similar problem with the following details Java 1.8.0_121, Spark spark-1.6.1-bin-hadoop2.6, Windows 10 and Eclipse Oxygen.When I ran my WordCount.java in Eclipse using HADOOP_HOME as a system variable as mentioned in the previous post, it did not work, what worked for me is - 我还遇到了类似的问题,包括以下细节:Java 1.8.0_121,Spark spark-1.6.1-bin-hadoop2.6,Windows 10和Eclipse Oxygen。当我使用HADOOP_HOME作为系统变量在Eclipse中运行我的WordCount.java时如前一篇文章所述,它没有用,对我有用的是 -
System.setProperty("hadoop.home.dir", "PATH/TO/THE/DIR"); System.setProperty(“hadoop.home.dir”,“PATH / TO / THE / DIR”);
PATH/TO/THE/DIR/bin=winutils.exe whether you run within Eclipse as a Java application or by spark-submit from cmd using PATH / TO / THE / DIR / bin = winutils.exe,无论您是在Eclipse中作为Java应用程序运行,还是通过cmd使用spark-submit运行
spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path to a demo test file /path to output directory command spark-submit --class groupid.artifactid.classname --master local [2] /使用maven / path创建的jar文件的路径到演示测试文件/输出目录命令的路径
Example: Go to the bin location of Spark/home/location/bin and execute the spark-submit as mentioned, 示例:转到Spark / home / location / bin的bin位置并执行spark-submit,如上所述,
D:\\BigData\\spark-2.3.0-bin-hadoop2.7\\bin>spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local[1] D:\\BigData\\spark-quickstart\\target\\spark-quickstart-0.0.1-SNAPSHOT.jar D:\\BigData\\spark-quickstart\\wordcount.txt D:\\ BigData \\ spark-2.3.0-bin-hadoop2.7 \\ bin> spark-submit --class com.bigdata.abdus.sparkdemo.WordCount --master local [1] D:\\ BigData \\ spark-quickstart \\ target \\ spark-quickstart-0.0.1-SNAPSHOT.jar D:\\ BigData \\ spark-quickstart \\ wordcount.txt
That's a tricky one... Your storage letter must be capical. 这是一个棘手的...你的存储信必须是capical。 For example " C :\\..."
例如“ C :\\ ......”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.