![](/img/trans.png)
[英]How to compare values in a pyspark dataframe column with another dataframe in pyspark
[英]How to list distinct values of pyspark dataframe wrt null values in another column
我有一個 pyspark dataframe:
rowNum Vehicle Production
1 1234 5678
2 null 1254
3 null 4567
4 null 4567
我想以 Vehicle 為 null 的列表格式選擇 Production 的所有不同值。如何實現?
result:
production list=['1254','4567']
如何在 pyspark dataframe 中實現
我會做這樣的事情:
# Using Spark 3.3.0
# Dataset as per the question
data = [
[1, '1234', 5678]
, [2, 'Null', 1254]
, [3, 'Null', 4567]
, [4, 'Null', 4567]
]
cols = ['rowNum', 'Vehicle', 'Production']
# Creating Dataframe
df = spark.createDataFrame(data, cols)
# list comprehension to represent the distinct Production values on 'Null' Vehicles
list = [p.Production for p in df.select('Production').distinct().where("Vehicle == 'Null'").collect()]
list
我擁有的 output 如下:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.