简体   繁体   English

除了 Spark Scala 之外的更好的替代品

[英]Better Alternatives to EXCEPT Spark Scala

I have been told that EXCEPT is a very costly operation and one should always try to avoid using EXCEPT .有人告诉我EXCEPT是一项非常昂贵的操作,应该始终尽量避免使用EXCEPT My Use Case -我的用例 -

val myFilter = "rollNo='11' AND class='10'"
val rawDataDf = spark.table(<table_name>)
val myFilteredDataframe = rawDataDf.where(myFilter)
val allOthersDataframe = rawDataDf.except(myFilteredDataframe)

But I am confused, in such use case, what are my alternatives?但我很困惑,在这种用例中,我的选择是什么?

Use left anti join as below-使用left anti join如下 -

 val df = spark.range(2).withColumn("name", lit("foo"))
    df.show(false)
    df.printSchema()
    /**
      * +---+----+
      * |id |name|
      * +---+----+
      * |0  |foo |
      * |1  |foo |
      * +---+----+
      *
      * root
      * |-- id: long (nullable = false)
      * |-- name: string (nullable = false)
      */
    val df2 = df.filter("id=0")
    df.join(df2, df.columns.toSeq, "leftanti")
      .show(false)

    /**
      * +---+----+
      * |id |name|
      * +---+----+
      * |1  |foo |
      * +---+----+
      */

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Scala在Spark中使用dropDuplicates()和except()方法的问题 - Issue with dropDuplicates() and except() method in Spark using Scala Spark文件加载-Scala中的``try&#39;&#39;和``except&#39;&#39; - Spark file load - `try` and `except` in scala 使用scala和spark扫描数据的更好方法 - Better way to scan data using scala and spark 如何在 spark 1.6 中显示带有标签的不匹配报告 - 除了函数之外的 scala? - How to display mismatched report with a label in spark 1.6 - scala except function? 在 Spark Scala 中创建 Dataframe - 哪种方法性能更好 - Creating Dataframe in Spark Scala - Which method gives better performance Spark-scala Join 的问题。 寻找更好的方法 - Issue with Spark-scala Join . Looking for a better Approach Spark:单个管道式scala命令比单独的命令好吗? - Spark: Single pipelined scala command better than separate commands? 多类分类,使用 Spark 在 Scala 中更好地显示原始预测 - Multiclass classification, show raw predictions better in Scala with Spark 访问推文时出错-&gt;“请使用 V2 过滤和采样体积流作为替代方案 - Spark Scala - Error in accessing tweets --> "Please use V2 filtered and sample volume stream as alternatives - Spark Scala spark scala 尝试获取最大值时“重载方法值 select 和替代项” - spark scala "Overloaded method value select with alternatives" when trying to get the max value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM