spark structured streaming joining aggregate dataframe to dataframe

Question

I have a streaming dataframe that could look at some point like:

+--------------------+--------------------+
|               owner|              fruits|
+--------------------+--------------------+
|Brian                | apple|
Brian                | pear |
Brian                | date|
Brian                | avocado|
Bob                | avocado|
Bob                | apple|
........
+--------------------+--------------------+

I performed a groupBy, agg collect_list to clean things up.

val myFarmDF = farmDF.withWatermark("timeStamp", "1 seconds").groupBy("fruits").agg(collect_list(col("fruits")) as "fruitsA")

the output is a single row for each owner and an array of every fruit. I would now like to join this cleaned up array to the original streaming dataframe dropping the fruits col and just having the fruitsA column

val joinedDF = farmDF.join(myFarmDF, "owner").drop("fruits")

this seems to work in my head, but spark doesn't seem to agree.

I get a

Failure when resolving conflicting references in Join:
'Join Inner
...
+- AnalysisBarrier
      +- Aggregate [name#17], [name#17, collect_list(fruits#61, 0, 0) AS fruitA#142]

When I turn everything into a static dataframe, it works just fine. Is this not possible in a streaming context?

Answer 1

Have you tried renaming the column name? There is a similar problem https://issues.apache.org/jira/browse/SPARK-19860

spark structured streaming joining aggregate dataframe to dataframe

Question

1 answers

solution1
0 ACCPTED 2018-06-04 22:24:24

spark structured streaming joining aggregate dataframe to dataframe

Question

1 answers

solution1 0 ACCPTED 2018-06-04 22:24:24

solution1
0 ACCPTED 2018-06-04 22:24:24