[英]How to loop through each row of dataFrame, and remove the row based on a condition
I'm beginner in Spark & scala.我是 Spark 和 scala 的初学者。 I would like to know on how to loop through each row of dataFrame, and remove the row based on a condition.
我想知道如何循环遍历 dataFrame 的每一行,并根据条件删除该行。
You can use filter operation on a dataframe in which you can specify a condition on the basis of which you want to filter record.您可以在 dataframe 上使用过滤器操作,您可以在其中指定要过滤记录的条件。 Below is an example:
下面是一个例子:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.{DataFrame, functions => F}
object Example extends App {
val spark = SparkSession.builder.appName("Simple Application").master("local")
.getOrCreate()
import spark.implicits._
val df1 = spark.sparkContext.parallelize(
List(
("Cust1", "Prod1", "Promo1", 1),
("Cust1", "Prod1", "Promo2", 2),
("Cust2", "Prod5", "Promo4", 11),
("Cust2", "Prod8", "Promo4", 12),
("Cust3", "Prod3", "Promo9", 14),
("Cust3", "Prod2", "Promo6", 13)
)).toDF("customer", "product", "promotion", "cardid")
.show()
}
The output of above code is:上述代码的output为:
+--------+-------+---------+------+
|customer|product|promotion|cardid|
+--------+-------+---------+------+
| Cust1| Prod1| Promo1| 1|
| Cust1| Prod1| Promo2| 2|
+--------+-------+---------+------+
In the above example i filtered records where value in product column is "Prod1" as can be seen in: df1.filter(F.col("product") === "Prod1")
在上面的示例中,我过滤了产品列中值为“Prod1”的记录,如下所示:
df1.filter(F.col("product") === "Prod1")
Filter operation iterates on each row of the dataframe and check the condition provided and keep all records where the condition is true.过滤操作对 dataframe 的每一行进行迭代,并检查提供的条件并保留条件为真的所有记录。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.