繁体   English   中英

如何使用scala将RDD [某些情况下的类]转换为csv文件?

[英]How to convert RDD[some case class] to csv file using scala?

我有一个RDD [case case class],我想将其转换为csv文件。 我正在使用spark 1.6和scala 2.10.5。

stationDetails.toDF.coalesce(1).write.format("com.databricks.spark.csv").save("data/myData.csv")

给出错误

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:219)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)

我无法在build.sbt文件中添加“ com.databricks.spark.csv”的依赖项。

我在build.sbt文件中添加的依赖项是:

libraryDependencies ++= Seq(
  "org.apache.commons" % "commons-csv" % "1.1",
  "com.univocity" % "univocity-parsers" % "1.5.1",
  "org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
  "org.scalatest" %% "scalatest" % "2.2.1" % "test",
  "com.novocode" % "junit-interface" % "0.9" % "test"
)

我也尝试过

stationDetails.toDF.coalesce(1).write.csv("data/myData.csv")

但它给出错误:CSV无法解析。

请将您的build.sbt更改为以下内容-

libraryDependencies ++= Seq(
  "org.apache.commons" % "commons-csv" % "1.1",
  "com.databricks" %% "spark-csv" % "1.4.0",
  "com.univocity" % "univocity-parsers" % "1.5.1",
  "org.slf4j" % "slf4j-api" % "1.7.5" % "provided",
  "org.scalatest" %% "scalatest" % "2.2.1" % "test",
  "com.novocode" % "junit-interface" % "0.9" % "test"
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM