[英]create pyspark dataframe based on condition and list of values
I have a value in a variable - ID
as 1
and a list of ten values say我在变量中有一个值 - ID
为1
和十个值的列表说
LIST1 = [1,2,3,4,5,6,7,8,9,10]
. LIST1 = [1,2,3,4,5,6,7,8,9,10]
。
Now I wanted to create a pyspark data frame as below:现在我想创建一个 pyspark 数据框,如下所示:
ID LIST
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
NOTE: The List1 length is dynamic, based on its length we need to have the rows accordingly.注意:List1 的长度是动态的,根据它的长度,我们需要相应地设置行。
It depends on whether the ID is constant or you will even have List2 with ID 2 and then want to union for both into one DataFrame.这取决于 ID 是否是常量,或者您甚至将拥有 ID 为 2 的 List2,然后希望将两者合并到一个 DataFrame 中。
As far as the constant is concerned, there are two options:就常数而言,有两种选择:
ID = 1
LIST1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source = list(map(lambda x: (ID, x), LIST1))
# source: [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10)]
df = spark.createDataFrame(source, ['ID', 'LIST'])
df.show()
# +---+----+
# | ID|LIST|
# +---+----+
# | 1| 1|
# | 1| 2|
# | 1| 3|
# | 1| 4|
# | 1| 5|
# | 1| 6|
# | 1| 7|
# | 1| 8|
# | 1| 9|
# | 1| 10|
# +---+----+
or或者
from pyspark.sql.functions import lit
ID = 1
LIST1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
source = list(map(lambda x: (x,), LIST1))
# createDataFrame needs iter of iters -> list/tuple of lists/tuples
df = spark.createDataFrame(source, ['LIST'])
df.withColumn('ID', lit(ID)).show()
+----+---+
|LIST| ID|
+----+---+
| 1| 1|
| 2| 1|
| 3| 1|
| 4| 1|
| 5| 1|
| 6| 1|
| 7| 1|
| 8| 1|
| 9| 1|
| 10| 1|
+----+---+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.