I have a string column that I need to filter. I need to obtain all the values that have letters or special characters in it.
Initial column:
id |
---|
12345 |
23456 |
3940A |
19045 |
2BB56 |
3(40A |
Expected output:
id |
---|
3940A |
2BB56 |
3(40A |
TIA
Just the simple digits regex can solve your problem. ^\d+$
would catch all values that is entirely digits.
from pyspark.sql import functions as F
df.where(F.regexp_extract('id', '^\d+$', 0) == '').show()
+-----+
| id|
+-----+
|3940A|
|2BB56|
|3(401|
+-----+
The question was very vague, so here is the best answer that I can give:
df_filtered = df.filter(any(not c.isdigit() for c in df.id))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.