簡體   English   中英

根據多個條件過濾列:Scala Spark

[英]Filter a column based on multiple conditions: Scala Spark

我在嘗試根據多個條件來篩選列中的行時遇到了麻煩。 基本上,我將多個條件存儲在一個數組中,並且希望對它們全部進行過濾。 但是,最后我總是出錯。 誰能建議解決此問題的方法? 這是我要實現的一些示例代碼:

    // Now let's filter through the ADM1 codes to select all 50 US States
val stateArray = Array("USAL", "USMD", "USCA", "USME", "USND", "USSD", "USWY", "USAK", "USWA", "USFL",
  "USGA", "USSC", "USNC", "USMA", "USNH", "USVT", "USAR", "USAZ", "USTX", "USLA", "USIL", "USOR", "USNV",
  "USID", "USMN", "USNM", "USNE", "USNJ", "USDE", "USVA", "USWV", "USTN", "USKY", "USNY", "USPA", "USIN",
  "USOH", "USHI", "USOK", "USIA", "USMI", "USMS", "USMO", "USCO", "USKS", "USUT", "USWI", "USMT", "USRI",
  "USCT")

// Let's filter through all of these conditions
val tmpDf3 = tmpDf1.filter(tmpDf("Actor1Geo_ADM1Code") === stateArray)

// I can do this with a for loop, but I want everything in one data frame
    for(n <- stateArray) {
  val tmpDf2 = tmpDf1
    .filter(tmpDf1("Actor1Geo_ADM1Code") === n)
  tmpDf2.show(false)
  tmpDf2.printSchema()
}

使用isin

tmpDf1.filter(tmpDf("Actor1Geo_ADM1Code").isin(stateArray: _*))

范例

val states = Array("USAL", "USMD")
// states: Array[String] = Array(USAL, USMD)

val df = Seq((1, "USAL"), (2, "USMD"), (3, "USGA")).toDF("id", "Actor1Geo_ADM1Code")
// df: org.apache.spark.sql.DataFrame = [id: int, Actor1Geo_ADM1Code: string]

df.show
+---+------------------+
| id|Actor1Geo_ADM1Code|
+---+------------------+
|  1|              USAL|
|  2|              USMD|
|  3|              USGA|
+---+------------------+


df.filter(df("Actor1Geo_ADM1Code").isin(states: _*)).show
+---+------------------+
| id|Actor1Geo_ADM1Code|
+---+------------------+
|  1|              USAL|
|  2|              USMD|
+---+------------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM