I have a streaming dataframe that could look at some point like:
+--------------------+--------------------+
| owner| fruits|
+--------------------+--------------------+
|Brian | apple|
Brian | pear |
Brian | date|
Brian | avocado|
Bob | avocado|
Bob | apple|
........
+--------------------+--------------------+
I performed a groupBy, agg collect_list to clean things up.
val myFarmDF = farmDF.withWatermark("timeStamp", "1 seconds").groupBy("fruits").agg(collect_list(col("fruits")) as "fruitsA")
the output is a single row for each owner and an array of every fruit. I would now like to join this cleaned up array to the original streaming dataframe dropping the fruits col and just having the fruitsA column
val joinedDF = farmDF.join(myFarmDF, "owner").drop("fruits")
this seems to work in my head, but spark doesn't seem to agree.
I get a
Failure when resolving conflicting references in Join:
'Join Inner
...
+- AnalysisBarrier
+- Aggregate [name#17], [name#17, collect_list(fruits#61, 0, 0) AS fruitA#142]
When I turn everything into a static dataframe, it works just fine. Is this not possible in a streaming context?
Have you tried renaming the column name? There is a similar problem https://issues.apache.org/jira/browse/SPARK-19860
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.