[英]How to filter out rows from spark dataframe containing unreadable characters
[英]How to delete/filter the specific rows from a spark dataframe
這是解決方案。 根據您的數據集,我提出了問題-
dataframe 下面的條目不正確。 我想刪除所有不正確的記錄並只保留正確的記錄 -
val Friends = Seq(
("Rahul", "99", "AA"),
("Rahul", "20", "BB"),
("Rahul", "30", "BB"),
("Mahesh", "55", "CC"),
("Mahesh", "88", "DD"),
("Mahesh", "44", "FF"),
("Ramu", "30", "FF"),
("Gaurav", "99", "PP"),
("Gaurav", "20", "HH")).toDF("Name", "Age", "City")
Arrays 用於濾波 -
val Name = List("Rahul", "Mahesh", "Gaurav")
val IncorrectAge = List(20, 55)
數據操作 -
Friends.filter(!(col("Name").isin(Name: _*) && col("Age").isin(IncorrectAge: _*))).show
這是 output -
+------+---+----+
| Name|Age|City|
+------+---+----+
| Rahul| 99| AA|
| Rahul| 30| BB|
|Mahesh| 88| DD|
|Mahesh| 44| FF|
| Ramu| 30| FF|
|Gaurav| 99| PP|
+------+---+----+
您也可以在連接的幫助下做到這一點..
創建不良記錄 df -
val badrecords = Friends.filter(col("Name").isin(Name: _*) && col("Age").isin(IncorrectAge: _*))
用戶 left_anti 加入 select 好友減壞記錄 -
Friends.alias("left").join(badrecords.alias("right"), Seq("Name", "Age"), "left_anti").show
這是 output -
+------+---+----+
| Name|Age|City|
+------+---+----+
| Rahul| 99| AA|
| Rahul| 30| BB|
|Mahesh| 88| DD|
|Mahesh| 44| FF|
| Ramu| 30| FF|
|Gaurav| 99| PP|
+------+---+----+
我認為您可能想要翻轉 not 條件.... dataframe 中的過濾器是 sql 中 where 子句的別名。
所以你希望查詢是
df.filter(col("Name").isin(Name:_*) && col("Age").isin(Age:_*))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.