简体   繁体   English

Pyspark 从列表中创建组合

[英]Pyspark create combinations from list

Say, I have Dataframe:说,我有 Dataframe:

df = spark.createDataFrame([['some_string', 'A'],['another_string', 'B']],['a','b'])
           a               |     b
---------------------------+------------
 some_string               |     A
 another_string            |     B

And i have list of ints like [1,2,3] What i want - is to add list column to my dataframe.我有像 [1,2,3] 这样的整数列表我想要的 - 是将列表列添加到我的 dataframe 中。

           a               |     b     |     c      
---------------------------+-----------+------------
 some_string               |     A     |     1      
 some_string               |     A     |     2      
 some_string               |     A     |     3      
 another_string            |     B     |     1      
 another_string            |     B     |     2      
 another_string            |     B     |     3      

Is there any way to do it without udf?没有udf有什么办法吗?

Use crossJoin .使用crossJoin Please check below code.请检查以下代码。

>>> dfa.show()
+--------------+---+
|             a|  b|
+--------------+---+
|   some_string|  A|
|another_string|  B|
+--------------+---+

>>> dfb.show()
+---+
| id|
+---+
|  1|
|  2|
|  3|
+---+

>>> dfa.crossJoin(dfb).show()
+--------------+---+---+
|             a|  b| id|
+--------------+---+---+
|   some_string|  A|  1|
|   some_string|  A|  2|
|   some_string|  A|  3|
|another_string|  B|  1|
|another_string|  B|  2|
|another_string|  B|  3|
+--------------+---+---+

You could also just use explode , and avoid unnecessary shuffle caused by joins .您也可以只使用explode ,并避免unnecessary shuffle caused by joins

ints=[1,2,3]

from pyspark.sql import functions as F

df.withColumn("c", F.explode(F.array(*[F.lit(x) for x in ints]))).show()

#+--------------+---+---+
#|             a|  b|  c|
#+--------------+---+---+
#|   some_string|  A|  1|
#|   some_string|  A|  2|
#|   some_string|  A|  3|
#|another_string|  B|  1|
#|another_string|  B|  2|
#|another_string|  B|  3|
#+--------------+---+---+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM