简体   繁体   English

加载 Excel 文件的强制性选项是什么?

[英]What are the mandatory options for loading Excel file?

I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here.我已经使用以下语法从 S3 加载了一个 excel 文件,但我想知道需要在此处设置的选项。

Why is it mandatory to set all the below options for loading excel file?为什么必须设置以下所有选项以加载 excel 文件? None of these options are mandatory for loading other file types like csv,del,json,avro etc.这些选项都不是加载其他文件类型(如 csv、del、json、avro 等)所必需的。

val data = sqlContext.read.
format("com.crealytics.spark.excel").
option("location", s3path).
option("useHeader", "true").
option("treatEmptyValuesAsNulls", "true").
option("inferSchema","true").
option("addColorColumns", "true").
load(path)

I get the below error if any of the above options(except location) are not set:如果未设置上述任何选项(位置除外),我会收到以下错误:

sqlContext.read.format("com.crealytics.spark.excel").option("location", s3path).load(s3path)

Error message :错误信息:

Name: java.lang.IllegalArgumentException
Message: Parameter "useHeader" is missing in options.
StackTrace:   at com.crealytics.spark.excel.DefaultSource.checkParameter(DefaultSource.scala:37)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:19)
          at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:7)
          at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
          at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
          at $anonfun$1.apply(<console>:47)
          at $anonfun$1.apply(<console>:47)
          at time(<console>:36)

Most of the options for spark-excel are mandatory except for userSchema and sheetName .除了userSchemasheetName之外, spark-excel大多数选项都是强制性的。

You can always check for that in the DataSource source code that you can find here .您始终可以在此处找到的 DataSource 源代码中进行检查。

You have to remember that this data source or data connector packages are implemented outside of the spark project and each comes with his rules and parameters.您必须记住,此数据源或数据连接器包是在 spark 项目之外实现的,并且每个包都带有其规则和参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM