[英]Pyspark create combinations from list
Say, I have Dataframe:说,我有 Dataframe:
df = spark.createDataFrame([['some_string', 'A'],['another_string', 'B']],['a','b'])
a | b
---------------------------+------------
some_string | A
another_string | B
And i have list of ints like [1,2,3] What i want - is to add list column to my dataframe.我有像 [1,2,3] 这样的整数列表我想要的 - 是将列表列添加到我的 dataframe 中。
a | b | c
---------------------------+-----------+------------
some_string | A | 1
some_string | A | 2
some_string | A | 3
another_string | B | 1
another_string | B | 2
another_string | B | 3
Is there any way to do it without udf?没有udf有什么办法吗?
Use crossJoin
.使用crossJoin
。 Please check below code.请检查以下代码。
>>> dfa.show()
+--------------+---+
| a| b|
+--------------+---+
| some_string| A|
|another_string| B|
+--------------+---+
>>> dfb.show()
+---+
| id|
+---+
| 1|
| 2|
| 3|
+---+
>>> dfa.crossJoin(dfb).show()
+--------------+---+---+
| a| b| id|
+--------------+---+---+
| some_string| A| 1|
| some_string| A| 2|
| some_string| A| 3|
|another_string| B| 1|
|another_string| B| 2|
|another_string| B| 3|
+--------------+---+---+
You could also just use explode
, and avoid unnecessary shuffle caused by joins
.您也可以只使用explode
,并避免unnecessary shuffle caused by joins
。
ints=[1,2,3]
from pyspark.sql import functions as F
df.withColumn("c", F.explode(F.array(*[F.lit(x) for x in ints]))).show()
#+--------------+---+---+
#| a| b| c|
#+--------------+---+---+
#| some_string| A| 1|
#| some_string| A| 2|
#| some_string| A| 3|
#|another_string| B| 1|
#|another_string| B| 2|
#|another_string| B| 3|
#+--------------+---+---+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.