使用HDFS中的文件到Apache Spark

Question

I have few files in my hdfs and I want to use them in Spark. 我的hdfs文件很少，我想在Spark中使用它们。 I am able to see my files when I give the following command: 输入以下命令后，我可以看到我的文件：

bin/hadoop dfs -ls /input

How should I give the path of this file in spark to create an RDD: 我应该如何在spark中指定此文件的路径以创建RDD：

val input=sc.textFile("???")

Answer 1

If your Spark installation is properly configured, then your normal HDFS paths should just work verbatim unchanged in Spark as well: 如果您的Spark安装配置正确，那么正常的HDFS路径也应该在Spark中按原样工作：

val input = sc.textFile("/input")

If that's not working then you probably need to make sure your Spark configuration is properly picking up your Hadoop conf dir . 如果这不起作用，那么您可能需要确保您的Spark配置正确选择了Hadoop conf dir 。

You might also want to try directly checking your file listings from your Spark code to make sure the configurations are getting imported properly: 您可能还想尝试直接从Spark代码检查文件列表，以确保正确导入了配置：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._

val path = new Path("/input")
path.getFileSystem(new Configuration()).listStatus(path)

使用HDFS中的文件到Apache Spark

问题描述

1 个解决方案

解决方案1
0 2016-06-26 05:15:17

使用HDFS中的文件到Apache Spark

问题描述

1 个解决方案

解决方案1 0 2016-06-26 05:15:17

解决方案1
0 2016-06-26 05:15:17