[英]scala/spark filter dataframe with multiple conditions using sql
[英]Filter a column based on multiple conditions: Scala Spark
我在嘗試根據多個條件來篩選列中的行時遇到了麻煩。 基本上,我將多個條件存儲在一個數組中,並且希望對它們全部進行過濾。 但是,最后我總是出錯。 誰能建議解決此問題的方法? 這是我要實現的一些示例代碼:
// Now let's filter through the ADM1 codes to select all 50 US States
val stateArray = Array("USAL", "USMD", "USCA", "USME", "USND", "USSD", "USWY", "USAK", "USWA", "USFL",
"USGA", "USSC", "USNC", "USMA", "USNH", "USVT", "USAR", "USAZ", "USTX", "USLA", "USIL", "USOR", "USNV",
"USID", "USMN", "USNM", "USNE", "USNJ", "USDE", "USVA", "USWV", "USTN", "USKY", "USNY", "USPA", "USIN",
"USOH", "USHI", "USOK", "USIA", "USMI", "USMS", "USMO", "USCO", "USKS", "USUT", "USWI", "USMT", "USRI",
"USCT")
// Let's filter through all of these conditions
val tmpDf3 = tmpDf1.filter(tmpDf("Actor1Geo_ADM1Code") === stateArray)
// I can do this with a for loop, but I want everything in one data frame
for(n <- stateArray) {
val tmpDf2 = tmpDf1
.filter(tmpDf1("Actor1Geo_ADM1Code") === n)
tmpDf2.show(false)
tmpDf2.printSchema()
}
使用isin
:
tmpDf1.filter(tmpDf("Actor1Geo_ADM1Code").isin(stateArray: _*))
范例 :
val states = Array("USAL", "USMD")
// states: Array[String] = Array(USAL, USMD)
val df = Seq((1, "USAL"), (2, "USMD"), (3, "USGA")).toDF("id", "Actor1Geo_ADM1Code")
// df: org.apache.spark.sql.DataFrame = [id: int, Actor1Geo_ADM1Code: string]
df.show
+---+------------------+
| id|Actor1Geo_ADM1Code|
+---+------------------+
| 1| USAL|
| 2| USMD|
| 3| USGA|
+---+------------------+
df.filter(df("Actor1Geo_ADM1Code").isin(states: _*)).show
+---+------------------+
| id|Actor1Geo_ADM1Code|
+---+------------------+
| 1| USAL|
| 2| USMD|
+---+------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.