Reading csv files to create dataframe

Question

I am trying to read a csv file to create a dataframe ( https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html )

Using:

spark-1.3.1-bin-hadoop2.6
spark-csv_2.11-1.1.0

Code:

import org.apache.spark.sql.SQLContext
object test {
 def main(args: Array[String]) {
       val conf = new SparkConf().setAppName("test")
       val sc = new SparkContext(conf)
       val sqlContext = new SQLContext(sc)
       val df = sqlContext.csvFile("filename.csv")
       ...
 }
}

Error:

value csvFile is not a member of org.apache.spark.sql.SQLContext

I was trying to do as advised here: Spark - load CSV file as DataFrame?

But sqlContext doesn't seem to recognize the csvFile method of CsvContext class.

Any advise would be appreciated!

Answer 1

I am also facing some issues with CSV(without Spark-CSV) but here is somethings that you can look at and check if they are OK.

Build the Spark shell with the spark-csv library using sbt assembly.
Add the spark-csv dependency to POM.XML of you maven project.
use the load/save methods of Dataframe API.

SPARK-CSV GITHUB

refer the spark-csv github readme.md page and you will up and running :)

Reading csv files to create dataframe

Question

1 answers

solution1
0 2015-06-16 18:28:50

Reading csv files to create dataframe

Question

1 answers

solution1 0 2015-06-16 18:28:50

solution1
0 2015-06-16 18:28:50