PySpark dataframe filter on multiple columns

Question

Using Spark 2.1.1

Below is my data frame

id Name1   Name2

1 Naveen Srikanth 

2 Naveen Srikanth123

3 Naveen 

4 Srikanth Naveen

Now need to filter rows based on two conditions that is 2 and 3 need to be filtered out as name has number's 123 and 3 has null value

using below code to filter only row id 2

df.select("*").filter(df["Name2"].rlike("[0-9]")).show()

got stuck up to include second condition.

Answer 1

doing the following should solve your issue

from pyspark.sql.functions import col
df.filter((!col("Name2").rlike("[0-9]")) | (col("Name2").isNotNull))

Answer 2

Should be as simple a putting multiple conditions into the filter.

val df = List(
  ("Naveen", "Srikanth"), 
  ("Naveen", "Srikanth123"), 
  ("Naveen", null), 
  ("Srikanth", "Naveen")).toDF("Name1", "Name2")

import spark.sqlContext.implicits._  
df.filter(!$"Name2".isNull && !$"Name2".rlike("[0-9]")).show

or if you prefer not use spark-sql $ :

df.filter(!df("Name2").isNull && !df("Name2").rlike("[0-9]")).show

or in Python:

df.filter(df["Name2"].isNotNull() & ~df["Name2"].rlike("[0-9]")).show()

PySpark dataframe filter on multiple columns

Question

2 answers

solution1
4 ACCPTED 2017-08-23 12:11:32

solution2
0 2017-08-23 11:50:43

PySpark dataframe filter on multiple columns

Question

2 answers

solution1 4 ACCPTED 2017-08-23 12:11:32

solution2 0 2017-08-23 11:50:43

solution1
4 ACCPTED 2017-08-23 12:11:32

solution2
0 2017-08-23 11:50:43