简体   繁体   中英

Spark java.lang.UnsupportedOperationException: empty collection

When I run this code, I get empty collection error in some cases.

    val result = df
                  .filter(col("channel_pk") === "abc")
                  .groupBy("member_PK")
                  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
                  .select("totalSum")
                  .rdd.map(_ (0).asInstanceOf[Double]).reduce(_ + _)

The error happens at this line:

.rdd.map(_ (0).asInstanceOf[Double]).reduce(_ + _)

When collection is empty, I want result to be equal to 0. How can I do it?

The error appears only at that line because this is the first time you make some action. before that spark doesn't execute anything (laziness). your df is just empty. You can verify it by adding before: assert(!df.take(1).isEmpty)

When collection is empty, I want result to be equal to 0. How can I do it?

Before you do aggregation, just check if the dataframe has some rows or not

val result = if(df.take(1).isEmpty) 0 else df
  .filter(col("channel_pk") === "abc")
  .groupBy("member_PK")
  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
  .select("totalSum")
  .rdd.map(_(0).asInstanceOf[Double]).reduce(_ + _)

or you can use count too

val result = if(df.count() == 0) 0 else df
  .filter(col("channel_pk") === "abc")
  .groupBy("member_PK")
  .agg(sum(col("price") * col("quantityOrdered")) as "totalSum")
  .select("totalSum")
  .rdd.map(_(0).asInstanceOf[Double]).reduce(_ + _)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM