简体   繁体   中英

Reading csv files to create dataframe

I am trying to read a csv file to create a dataframe ( https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html )

Using:

spark-1.3.1-bin-hadoop2.6
spark-csv_2.11-1.1.0

Code:

import org.apache.spark.sql.SQLContext
object test {
 def main(args: Array[String]) {
       val conf = new SparkConf().setAppName("test")
       val sc = new SparkContext(conf)
       val sqlContext = new SQLContext(sc)
       val df = sqlContext.csvFile("filename.csv")
       ...
 }
}

Error:

value csvFile is not a member of org.apache.spark.sql.SQLContext

I was trying to do as advised here: Spark - load CSV file as DataFrame?

But sqlContext doesn't seem to recognize the csvFile method of CsvContext class.

Any advise would be appreciated!

I am also facing some issues with CSV(without Spark-CSV) but here is somethings that you can look at and check if they are OK.

  1. Build the Spark shell with the spark-csv library using sbt assembly.
  2. Add the spark-csv dependency to POM.XML of you maven project.
  3. use the load/save methods of Dataframe API.

SPARK-CSV GITHUB

refer the spark-csv github readme.md page and you will up and running :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM