[英]Spark & Scala: Read in CSV file as DataFrame / Dataset
來自R
世界我想使用Scala Shell( ./spark-shell
shell)將.csv導入Spark( ./spark-shell
)
我的.csv有標題,看起來像
"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104
謝謝。
Spark 2.0+
由於databricks/spark-csv
已集成到Spark中,因此使用SparkSession
讀取.CSVs非常簡單
val spark = .builder()
.master("local")
.appName("Word Count")
.getOrCreate()
val df = spark.read.option("header", true).csv(path)
舊版本
重新啟動我的Spark-shell之后,我自己搞清楚了-可能對其他人有幫助:
按照此處所述安裝並使用./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
啟動spark-shell之后:
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/home/vb/opt/spark/data/mllib/mydata.csv")
scala> df.printSchema()
root
|-- col1: double (nullable = true)
|-- col2: string (nullable = true)
|-- col3: integer (nullable = true)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.