简体   繁体   English

根据条件和值列表创建 pyspark 数据框

[英]create pyspark dataframe based on condition and list of values

I have a value in a variable - ID as 1 and a list of ten values say我在变量中有一个值 - ID1和十个值的列表说

LIST1 = [1,2,3,4,5,6,7,8,9,10] . LIST1 = [1,2,3,4,5,6,7,8,9,10]

Now I wanted to create a pyspark data frame as below:现在我想创建一个 pyspark 数据框,如下所示:

ID  LIST
1   1
1   2
1   3
1   4
1   5
1   6
1   7
1   8
1   9
1   10

NOTE: The List1 length is dynamic, based on its length we need to have the rows accordingly.注意:List1 的长度是动态的,根据它的长度,我们需要相应地设置行。

It depends on whether the ID is constant or you will even have List2 with ID 2 and then want to union for both into one DataFrame.这取决于 ID 是否是常量,或者您甚至将拥有 ID 为 2 的 List2,然后希望将两者合并到一个 DataFrame 中。

As far as the constant is concerned, there are two options:就常数而言,有两种选择:

ID = 1
LIST1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

source = list(map(lambda x: (ID, x), LIST1))
# source: [(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10)]

df = spark.createDataFrame(source, ['ID', 'LIST'])
df.show()
# +---+----+                                                                      
# | ID|LIST|
# +---+----+
# |  1|   1|
# |  1|   2|
# |  1|   3|
# |  1|   4|
# |  1|   5|
# |  1|   6|
# |  1|   7|
# |  1|   8|
# |  1|   9|
# |  1|  10|
# +---+----+

or或者

from pyspark.sql.functions import lit

ID = 1
LIST1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

source = list(map(lambda x: (x,), LIST1))
# createDataFrame needs iter of iters -> list/tuple of lists/tuples
df = spark.createDataFrame(source, ['LIST'])
df.withColumn('ID', lit(ID)).show()
+----+---+
|LIST| ID|
+----+---+
|   1|  1|
|   2|  1|
|   3|  1|
|   4|  1|
|   5|  1|
|   6|  1|
|   7|  1|
|   8|  1|
|   9|  1|
|  10|  1|
+----+---+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列表中的值过滤 pyspark dataframe - Filter the pyspark dataframe based on values in list 根据条件更改窗口pyspark数据框中的所有行值 - Change all row values in a window pyspark dataframe based on a condition PySpark:根据列条件使用来自另一个行的行创建子集数据框 - PySpark: Create subset dataframe with rows from another based on a column condition 根据条件将数据框的值移动到列表 - Move Values of a dataframe to a list based on a condition 根据列表中的匹配值在 pyspark 数据框中添加新列 - add a new column in pyspark dataframe based on matching values from a list 使用列表中的随机值在 Pyspark 中创建数据框 - Create a dataframe in Pyspark using random values from a list PySpark - 从字典中创建一个 Dataframe,其中包含每个键的值列表 - PySpark - Create a Dataframe from a dictionary with list of values for each key 在 PySpark 中 - 如果列表中的值位于不同的 DataFrame 的行中,如何在 PySpark 中创建新的 DataFrame? - In PySpark - How to create a new DataFrame in PySpark if values from list are in row of a different DataFrame? 如何根据列表值创建修改后的 dataframe? - How to create modified dataframe based on list values? PySpark DataFrame groupby 进入值列表? - PySpark DataFrame groupby into list of values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM