[英]Read excel files with apache spark
(new to apache spark) (新到 apache 火花)
I tried to create a small Scala Spark app which read excel files and insert data into database, but I have some errors which are occured due of different library versions (I think).我尝试创建一个小型 Scala Spark 应用程序,它读取 excel 文件并将数据插入数据库,但由于库版本不同(我认为),我有一些错误。
Scala v2.12
Spark v3.0
Spark-Excel v0.13.1
Maven configuration is: Maven配置为:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.crealytics/spark-excel -->
<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.12</artifactId>
<version>0.13.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.11.1</version>
</dependency>
</dependencies>
Main.scala主.scala
val spark = SparkSession
.builder
.appName("SparkApp")
.master("local[*]")
.config("spark.sql.warehouse.dir", "file:///C:/temp") // Necessary to work around a Windows bug in Spark 2.0.0; omit if you're not on Windows.
.getOrCreate()
val path = "file_path"
val excel = spark.read
.format("com.crealytics.spark.excel")
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "false")
.option("inferSchema", "false")
.option("location", path)
.option("addColorColumns", "false")
.load()
println(s"excel count is ${excel.count}")
Error is:错误是:
Exception in thread "main" scala.MatchError: Map(treatemptyvaluesasnulls -> false, location -> file_path, useheader -> true, inferschema -> false, addcolorcolumns -> false) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
at com.crealytics.spark.excel.WorkbookReader$.apply(WorkbookReader.scala:38)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:28)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
at main.scala.Main$.main(Main.scala:42)
at main.scala.Main.main(Main.scala)
This happening only when I try to read excel files because I use spark-excel library.仅当我尝试读取 excel 文件时才会发生这种情况,因为我使用 spark-excel 库。 Csv or tsv works fine.
Csv 或 tsv 工作正常。
I think, you forgot specifying the excel in load
like spark.read....load("Worktime.xlsx")
我想,您忘记在
spark.read....load("Worktime.xlsx")
之类的load
中指定 excel
Sample example -示例示例 -
val df = spark.read
.format("com.crealytics.spark.excel")
.option("dataAddress", "'My Sheet'!B3:C35") // Optional, default: "A1"
.option("header", "true") // Required
.option("treatEmptyValuesAsNulls", "false") // Optional, default: true
.option("inferSchema", "false") // Optional, default: false
.option("addColorColumns", "true") // Optional, default: false
.option("timestampFormat", "MM-dd-yyyy HH:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss[.fffffffff]
.option("maxRowsInMemory", 20) // Optional, default None. If set, uses a streaming reader which can help with big files
.option("excerptSize", 10) // Optional, default: 10. If set and if schema inferred, number of rows to infer schema from
.option("workbookPassword", "pass") // Optional, default None. Requires unlimited strength JCE for older JVMs
.schema(myCustomSchema) // Optional, default: Either inferred schema, or all columns are Strings
.load("Worktime.xlsx")
I know that this doesn't answer directly your questions, but this may still help your in solving your issue.我知道这并不能直接回答您的问题,但这仍然可以帮助您解决问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.