简体   繁体   中英

How to read an .xls file from AWS S3 using spark in java? And unable to read sheetName

I am trying to read a .xls file from AWS S3 but getting java.io.FileNotFoundException exception.

I tried below two approaches. One by giving the path in option() with key location and another by adding the same path in load() as well.

Dataset<Row> segmentConfigData = spark.read()
                .format("com.crealytics.spark.excel")
                .option("sheetName", "sheet1")
                .option("header","true")
                .option("location","s3a://input/552SegmentConfig.xls")
                .option("useHeader", "true")
                .option("treatEmptyValuesAsNulls", "true")
                .option("inferSchema", "true")
                .option("addColorColumns", "False")
                .load();

Dataset<Row> segmentConfigData = spark.read()
                .format("com.crealytics.spark.excel")
                .option("sheetName", "sheet1")
                .option("header","true")
                .option("location","s3a://input/552SegmentConfig.xls")
                .option("useHeader", "true")
                .option("treatEmptyValuesAsNulls", "true")
                .option("inferSchema", "true")
                .option("addColorColumns", "False")
                .load("s3a://input/552SegmentConfig.xls");

I get file not found an exception. Similarly, when I read .csv file I am able to read the file.

Edit- I have solved this issue. I was using an older version of "com.crealytics.spark.excel". I was able to ready once I ungraded the jar.

But now I am facing another issue. I am unable to read any other sheet other then the first sheet. Any Help?

I have solved this issue. I was using an older version of "com.crealytics.spark.excel". I was able to ready once I ungraded the jar.

Further, I was just able to read the first sheet of (.xls) file. Below is the code snippet:

spark.read()
    .format("com.crealytics.spark.excel")
    .option("location",path)
    .option("sheetName", sheetName)
    .option("dataAddress", "'"+sheetName+"'!A1")
    .option("header","true")
    .option("useHeader", "true")
    .option("treatEmptyValuesAsNulls", "true")
    .option("inferSchema", "true")
    .option("addColorColumns", "False")
    .load(path);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM