简体   繁体   中英

Spark Streaming Data to S3

I'm building data lake in S3. Hence, I would like to store the raw data stream into s3 and below is my code snippet, where I have tried with local storage.

val tweets = TwitterUtils.createStream(ssc, None)
val engtweets = tweets.filter(status => status.getLang() == "en").map(x => x.getText())
  import sql.implicits._
engtweets.foreachRDD { rdd =>
    val df = rdd.toDF()
    df.write.format("json").save("../Ramesh")
 }

I would like to store Raw data(entire JSON object) in s3.

Just setup the access key and secret key in core-site.xml as follows:

<property>
    <name>fs.s3a.access.key</name>
    <value>...</value>
</property>
<property>
    <name>fs.s3a.secret.key</name>
    <value>...</value>
</property>

Once you have done this, you should be able to write into s3 using s3 protocol like : s3a:///

Hope this helps!

You can simply use saveAsTextFile method with path prefixed as

s3a://<file path>

required, your Amazon s3 is set-up correctly with or without credential.

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_s3.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM