简体   繁体   中英

How do I specify a S3 bucket as my input to EMR

Instead of copying over to HDFS, is it possible to just get an array of objects in a bucket in S3 to be processed in EMR?

I've tried this and I keep on either getting security warnings for not having credentials (even after I add them to the configs) (this is from just doing new Path("s3n://...")) or running the jar tells me I am missing the AWS sdk when I try to use the AWS sdk to access my bucket.

You can add it in the arguments section

While adding it as step select CustomJAR

JAR location: s3://inbsightshadoop/jar/loganalysis.jar
Main class: None
Arguments: s3://inbsightshadoop/insights-input s3://inbsightshadoop/insights-output
Action on failure: Terminate cluster

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM