Spark Streaming Data to S3

Question

I'm building data lake in S3. Hence, I would like to store the raw data stream into s3 and below is my code snippet, where I have tried with local storage.

val tweets = TwitterUtils.createStream(ssc, None)
val engtweets = tweets.filter(status => status.getLang() == "en").map(x => x.getText())
  import sql.implicits._
engtweets.foreachRDD { rdd =>
    val df = rdd.toDF()
    df.write.format("json").save("../Ramesh")
 }

I would like to store Raw data(entire JSON object) in s3.

Answer 1

Just setup the access key and secret key in core-site.xml as follows:

<property>
    <name>fs.s3a.access.key</name>
    <value>...</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>...</value>
</property>

Once you have done this, you should be able to write into s3 using s3 protocol like : s3a:///

Hope this helps!

Answer 2

You can simply use saveAsTextFile method with path prefixed as

s3a://<file path>

required, your Amazon s3 is set-up correctly with or without credential.

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_s3.html

Spark Streaming Data to S3

Question

2 answers

solution1
1 2017-10-09 17:31:43

solution2
0 2017-10-08 08:08:42

Spark Streaming Data to S3

Question

2 answers

solution1 1 2017-10-09 17:31:43

solution2 0 2017-10-08 08:08:42

solution1
1 2017-10-09 17:31:43

solution2
0 2017-10-08 08:08:42