using one table to update another table in spark

Question

I have two table or dataframes , and I want to using one to update another one. Also I have know spark sql does not support update a set a.1= b.1 from b where a.2 = b.2 and a.update < b.update . Please suggest me how can i achieve this as it is not possible in spark.

table1

+------+----+------+
|number|name|update|
+------+--- -------+
|     1|   a| 08-01|
|     2|   b| 08-02|
+------+----+------+

table2

    +------+----+------+
    |number|name|update|
    +------+--- -------+
    |     1|  a2| 08-03|
    |     3|   b| 08-02|
    +------+----+------+

I want to get this:

    +------+----+------+
    |number|name|update|
    +------+--- -------+
    |     1|  a2| 08-03|
    |     2|   b| 08-02|
    |     3|   b| 08-02|
    +------+----+------+

Are there have any other way to do this in spark?

Answer 1

Using pyspark , you could use subtract() to find the number values of table1 not present in table2 , and consequently use unionAll of the two tables where table1 is filtered down to the missing observations from table2 .

diff = (table1.select('number')
        .subtract(table2.select('number'))
        .rdd.map(lambda x: x[0]).collect())

table2.unionAll(table1[table1.number.isin(diff)]).orderBy('number').show()
+------+----+------+
|number|name|update|
+------+----+------+
|     1|  a2| 08-03|
|     2|   b| 08-02|
|     3|   b| 08-02|
+------+----+------+

using one table to update another table in spark

Question

1 answers

solution1
1 2016-10-11 09:39:09

using one table to update another table in spark

Question

1 answers

solution1 1 2016-10-11 09:39:09

solution1
1 2016-10-11 09:39:09