How to filter alphabetic values from a String column in Pyspark Dataframe?

Question

I have a string column that I need to filter. I need to obtain all the values that have letters or special characters in it.

Initial column:

id
12345
23456
3940A
19045
2BB56
3(40A

Expected output:

id
3940A
2BB56
3(40A

TIA

Answer 1

Just the simple digits regex can solve your problem. ^\d+$ would catch all values that is entirely digits.

from pyspark.sql import functions as F

df.where(F.regexp_extract('id', '^\d+$', 0) == '').show()

+-----+
|   id|
+-----+
|3940A|
|2BB56|
|3(401|
+-----+

Answer 2

The question was very vague, so here is the best answer that I can give:

df_filtered = df.filter(any(not c.isdigit() for c in df.id))

How to filter alphabetic values from a String column in Pyspark Dataframe?

Question

2 answers

solution1
1 ACCPTED 2021-12-02 20:07:53

solution2
0 2021-12-02 19:56:38

How to filter alphabetic values from a String column in Pyspark Dataframe?

Question

2 answers

solution1 1 ACCPTED 2021-12-02 20:07:53

solution2 0 2021-12-02 19:56:38

solution1
1 ACCPTED 2021-12-02 20:07:53

solution2
0 2021-12-02 19:56:38