简体   繁体   English

Spark:将数据帧写入s3存储桶

[英]Spark : Writing data frame to s3 bucket

I am trying to write DF data to S3 bucket. 我正在尝试将DF数据写入S3存储桶。 It is working fine as expected. 它工作正常。 Now i want to write to s3 bucket based on condition. 现在我想根据条件写入s3存储桶。

In data frame i am having one column as Flag and in that column values are T and F . 在数据帧中,我有一列作为Flag,在那一列中的值为T和F。 Now the condition is If Flag is F then it should write the data to S3 bucket otherwise No. Please find the details below. 现在的条件是,如果标志为F,则它将数据写入S3存储桶,否则为否。请在下面找到详细信息。

DF Data : DF数据:

1015,2017/08,新潟,101,SW,39,1015,2017/08,山形,101,SW,10,29,74.35897435897436,11.0,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,大分,101,SW,14,25,64.1025641025641,15.4,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,山口,101,SW,6,33,84.61538461538461,6.6,T
1015,2017/08,新潟,101,SW,39,1015,2017/08,愛媛,101,SW,5,34,87.17948717948718,5.5,T
1015,2017/08,新潟,101,SW,39,1015,2017/08,神奈川,101,SW,114,75,192.30769230769232,125.4,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,富山,101,SW,12,27,69.23076923076923,13.2,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,高知,101,SW,3,36,92.3076923076923,3.3,T
1015,2017/08,新潟,101,SW,39,1015,2017/08,岩手,101,SW,11,28,71.7948717948718,12.1,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,三重,101,SW,45,6,15.384615384615385,49.5,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,京都,101,SW,23,16,41.02564102564102,25.3,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,静岡,101,SW,32,7,17.94871794871795,35.2,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,鹿児島,101,SW,18,21,53.84615384615385,19.8,F
1015,2017/08,新潟,101,SW,39,1015,2017/08,福島,101,SW,17,22,56.41025641025641,18.7,F

Code : 代码:

val df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("s3a://test_system/transcation.csv")
    df.createOrReplaceTempView("data")
    val res = spark.sql("select count(*) from data")
    res.show(10)
    res.coalesce(1).write.format("csv").option("header","true").mode("Overwrite")
     .save("s3a://test_system/Output/Test_Result")
     res.createOrReplaceTempView("res1")
     val res2 = spark.sql("select distinct flag from res1 where flag = 'F'")
     if (res2 ==='F')
     {
     //writing to s3 bucket as raw data .Here transcation.csv file.
     df.write.format("csv").option("header","true").mode("Overwrite")
     .save("s3a://test_system/Output/Test_Result/rawdata")
     }

I am trying this approach but it is not exporting df data to s3 bucket. 我正在尝试这种方法,但它没有将df数据导出到s3存储桶。 How can i export/write data to S3 bucket by using condition? 如何通过使用条件将数据导出/写入S3存储桶?

Many thanks for your help. 非常感谢您的帮助。

I am assuming you want to write the dataframe given a "F" flag present in the dataframe. 我假设您要在数据帧中存在“ F”标志的情况下编写数据帧。

val df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("s3a://test_system/transcation.csv")
df.createOrReplaceTempView("data")
val res = spark.sql("select count(*) from data")
res.show(10)
res.coalesce(1).write.format("csv").option("header","true").mode("Overwrite")
  .save("s3a://test_system/Output/Test_Result")
res.createOrReplaceTempView("res1")

Here we are using the data table since res1 table is just a count table which you created above. 由于res1表只是您在上面创建的计数表,因此我们在这里使用data表。 Also from the result dataframe, we are selecting just the first row by using first() function and the first column from that row using getAs[String](0) 同样从结果数据框中,我们通过使用first()函数仅选择第一行,并使用getAs[String](0)从该行中getAs[String](0)第一列

val res2 = spark.sql("select distinct flag from data where flag = 'F'").first().getAs[String](0)

println("Printing out res2 = " + res2)

Here we are doing a comparision between the string extracted above and the string "F" . 在这里,我们在上面提取的字符串和字符串"F"之间进行了比较。 Remember "F" is a string while 'F' is a char in scala. 请记住, "F"是字符串,而'F'是scala中的字符。

if (res2.equals("F"))
{
  println("Inside the if loop")
  //writing to s3 bucket as raw data .Here transcation.csv file.
  df.write.format("csv").option("header","true").mode("Overwrite")
    .save("s3a://test_system/Output/Test_Result/rawdata")
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM