Read csv file from Hadoop using Spark

Question

I'm using spark-shell to read csv files from hdfs. I can read those csv file using the following code in bash:

bin/hadoop fs -cat /input/housing.csv |tail -5

so this suggest the housing.csv is indeed in hdfs right now. How can I read it using spark-shell? Thanks in advance.

sc.textFile("hdfs://input/housing.csv").first()

I tried this way, but failed.

Answer 1

Include the csv package in the shell and

var df = spark.read.format("csv").option("header", "true").load("hdfs://x.x.x.x:8020/folder/file.csv")

8020 is the default port.

Thanks, Ash

Answer 2

You can read this easily with spark using csv method or by specifying format("csv") . In your case either you should not specify hdfs:// or you should specify complete path hdfs://localhost:8020/input/housing.csv .

Here is a snippet of code that can read csv.

val df = spark.
        read.
        schema(dataSchema).
        csv(s"/input/housing.csv")

Read csv file from Hadoop using Spark

Question

2 answers

solution1
0 2019-08-22 13:18:20

solution2
0 2019-08-22 13:39:51

Read csv file from Hadoop using Spark

Question

2 answers

solution1 0 2019-08-22 13:18:20

solution2 0 2019-08-22 13:39:51

solution1
0 2019-08-22 13:18:20

solution2
0 2019-08-22 13:39:51