I'm using spark-shell to read csv files from hdfs. I can read those csv file using the following code in bash:
bin/hadoop fs -cat /input/housing.csv |tail -5
so this suggest the housing.csv is indeed in hdfs right now. How can I read it using spark-shell? Thanks in advance.
sc.textFile("hdfs://input/housing.csv").first()
I tried this way, but failed.
Include the csv package in the shell and
var df = spark.read.format("csv").option("header", "true").load("hdfs://x.x.x.x:8020/folder/file.csv")
8020 is the default port.
Thanks, Ash
You can read this easily with spark using csv
method or by specifying format("csv")
. In your case either you should not specify hdfs://
or you should specify complete path hdfs://localhost:8020/input/housing.csv
.
Here is a snippet of code that can read csv.
val df = spark.
read.
schema(dataSchema).
csv(s"/input/housing.csv")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.