I am trying to read data from my hdfs, the location is also mentioned. But I'm not getting the data because it is showing some ConnectionException.
I'm attaching the log files also. What will be the port number for hadoop ? Should we track for 50070?
import org.apache.spark.SparkContext;
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.PrintWriter;
import java.net.URI;
object random {
def main(args :Array[String]) :Unit=
{
System.setProperty("hadoop.home.dir", "D:\\Softwares\\Hadoop")
val conf=new SparkConf().setMaster("local").setAppName("Hello");
val sc=new SparkContext(conf);
val hdfs = FileSystem.get(new URI("hdfs://104.211.213.47:50070/"), new Configuration())
val path = new Path("/user/m1047068/retail/logerrors.txt")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))
//This example checks line for null and prints every existing line consequentally
readLines.takeWhile(_ != null).foreach(line => println(line))
}
}
--------------------------------------------------------------------------------
This is the log files I'm getting. I'm not aware of the exception, as I'm new to this Spark field.
2018-09-17 14:50:51 INFO SparkContext:54 - Running Spark version 2.3.0
2018-09-17 14:50:51 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-09-17 14:50:51 INFO SparkContext:54 - Submitted application: Hello
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing view acls to: M1047068
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing modify acls to: M1047068
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing view acls groups to:
2018-09-17 14:50:51 INFO SecurityManager:54 - Changing modify acls groups to:
2018-09-17 14:50:51 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(M1047068); groups with view permissions: Set(); users with modify permissions: Set(M1047068); groups with modify permissions: Set()
2018-09-17 14:50:52 INFO Utils:54 - Successfully started service 'sparkDriver' on port 51772.
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering MapOutputTracker
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-09-17 14:50:52 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-09-17 14:50:52 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-09-17 14:50:52 INFO DiskBlockManager:54 - Created local directory at C:\Users\M1047068\AppData\Local\Temp\blockmgr-682d85a7-831e-4178-84de-5ade348a45f4
2018-09-17 14:50:52 INFO MemoryStore:54 - MemoryStore started with capacity 896.4 MB
2018-09-17 14:50:52 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2018-09-17 14:50:53 INFO log:192 - Logging initialized @3046ms
2018-09-17 14:50:53 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2018-09-17 14:50:53 INFO Server:414 - Started @3188ms
2018-09-17 14:50:53 INFO AbstractConnector:278 - Started ServerConnector@493dc226{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-17 14:50:53 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16ce702d{/jobs,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@40238dd0{/jobs/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7776ab{/jobs/job,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@dbd8e44{/jobs/job/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@51acdf2e{/stages,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6a55299e{/stages/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2f1de2d6{/stages/stage,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3a0baae5{/stages/stage/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ac0e420{/stages/pool,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@289710d9{/stages/pool/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5a18cd76{/storage,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3da30852{/storage/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@403f0a22{/storage/rdd,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@503ecb24{/storage/rdd/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c51cf28{/environment,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6995bf68{/environment/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5143c662{/executors,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@77825085{/executors/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3568f9d2{/executors/threadDump,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@71c27ee8{/executors/threadDump/json,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e7dd664{/static,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4748a0f9{/,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b14918a{/api,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@77d67cf3{/jobs/job/kill,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6dee4f1b{/stages/stage/kill,null,AVAILABLE,@Spark}
2018-09-17 14:50:53 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://G1C2ML15621.mindtree.com:4040
2018-09-17 14:50:53 INFO Executor:54 - Starting executor ID driver on host localhost
2018-09-17 14:50:53 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51781.
2018-09-17 14:50:53 INFO NettyBlockTransferService:54 - Server created on G1C2ML15621.mindtree.com:51781
2018-09-17 14:50:53 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2018-09-17 14:50:53 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManagerMasterEndpoint:54 - Registering block manager G1C2ML15621.mindtree.com:51781 with 896.4 MB RAM, BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, G1C2ML15621.mindtree.com, 51781, None)
2018-09-17 14:50:53 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6cbcf243{/metrics/json,null,AVAILABLE,@Spark}
Exception in thread "main" java.net.ConnectException: Call From G1C2ML15621/172.17.124.224 to 104.211.213.47:50070 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at random$.main(random.scala:20)
at random.main(random.scala)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
... 25 more
2018-09-17 14:51:00 INFO SparkContext:54 - Invoking stop() from shutdown hook
2018-09-17 14:51:00 INFO AbstractConnector:318 - Stopped Spark@493dc226{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2018-09-17 14:51:00 INFO SparkUI:54 - Stopped Spark web UI at http://G1C2ML15621.mindtree.com:4040
2018-09-17 14:51:00 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2018-09-17 14:51:00 INFO MemoryStore:54 - MemoryStore cleared
2018-09-17 14:51:00 INFO BlockManager:54 - BlockManager stopped
2018-09-17 14:51:00 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2018-09-17 14:51:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2018-09-17 14:51:00 INFO SparkContext:54 - Successfully stopped SparkContext
2018-09-17 14:51:00 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-17 14:51:00 INFO ShutdownHookManager:54 - Deleting directory C:\Users\M1047068\AppData\Local\Temp\spark-84d5b3c8-a609-42da-8e5e-5492400f309d
Spark can't read from webhdfs.
You need to use the port number that exists on the fs.defaultFS property in your core-site.xml
And you don't need to set hadoop home property if you copy your Hadoop XML files into the conf folder in the Spark installation as well as define HADOOP_CONF_DIR
environment folder
And as of Spark2, you want to be using SparkSession, and from a session, you would use textFile method for reading a file.
You'll never need to create a raw filesystem object yourself in Spark.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.