java.io.FileNotFoundException: s3:/my_bucket/input2.properties while accesing from S3 bucket via jar using AWS EMR

Question

I am new to Amazon EMR and I need to pass properties file placed in s3 to the sbt assembly jar and have two .properties file to be passed as an argument in it. Can it be done without using S3 InputStream. Are there any alternative ways where my work will be done without placing the properties files in s3 bucket? If yes then where? As I need to run on emr cluster. On local machine spark this jar is working fine and jar has all dependencies like hadoop-aws and aws-sdk jar for java in it.

Currently I am reading the files using java inside scala:

import java.io.FileReader
import java.util.Properties

val fileReader1 = new FileReader(args(0))
val properties1: Properties = new Properties()
properties1.load(fileReader1)

val fileReader2 = new FileReader(args(1))
val properties2: Properties = new Properties()
properties2.load(fileReader2)

The details for running spark application are here.

 Exception in thread "main" java.io.FileNotFoundException: s3:/my_bucket/input2.properties (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at java.io.FileReader.<init>(FileReader.java:58)
        at my_main_class$.main(my_main_class.scala:18)
        at my_main_class.main(my_main_class.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    20/06/25 17:03:30 INFO ShutdownHookManager: Shutdown hook called
    20/06/25 17:03:30 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4cf77e41-ecbd-4645-93f7-f531766efc31
    Command exiting with ret '1'**strong text**

I dont know why its happening as the path is also correct. Also I have tried with s3a and s3n but no luck. Any help will be appreciated. Thanks.!!

Answer 1

Don't know about how to use java FileReader to read files on s3.

You can use hadoop file system api to read the file from s3.
In emr cluster there is one hadoop-assembly jar (I guess the location /usr/share/aws/emr/emrfs/lib/ ) that takes case of EMRFS.
In your code just add hadoop-common.jar .

Here is a Java code:

Configuration conf = new Configuration();
URI s3uri = new URI("s3://my_bucket");
FileSystem fs = FileSystem.get(s3uri, conf);
Path inFile = new Path("<path-to-properties>/input2.properties"); //relative path or full path
FSDataInputStream in = fs.open(inFile);

NB: Don't use hadoop-aws jar as it isn't EMRFS. You can read more on here

java.io.FileNotFoundException: s3:/my_bucket/input2.properties while accesing from S3 bucket via jar using AWS EMR

Question

1 answers

solution1
0 2020-06-25 22:26:46

java.io.FileNotFoundException: s3:/my_bucket/input2.properties while accesing from S3 bucket via jar using AWS EMR

Question

1 answers

solution1 0 2020-06-25 22:26:46

solution1
0 2020-06-25 22:26:46