[英]How to filter alphabetic values from a String column in Pyspark Dataframe?
I have a string column that I need to filter.我有一个需要过滤的字符串列。 I need to obtain all the values that have letters or special characters in it.我需要获取所有包含字母或特殊字符的值。
Initial column:初始列:
id ID |
---|
12345 12345 |
23456 23456 |
3940A 3940A |
19045 19045 |
2BB56 2BB56 |
3(40A 3(40A |
Expected output:预期 output:
id ID |
---|
3940A 3940A |
2BB56 2BB56 |
3(40A 3(40A |
TIA TIA
Just the simple digits regex can solve your problem.只需简单的数字正则表达式就可以解决您的问题。 ^\d+$
would catch all values that is entirely digits. ^\d+$
将捕获所有完全是数字的值。
from pyspark.sql import functions as F
df.where(F.regexp_extract('id', '^\d+$', 0) == '').show()
+-----+
| id|
+-----+
|3940A|
|2BB56|
|3(401|
+-----+
The question was very vague, so here is the best answer that I can give:这个问题非常模糊,所以这是我能给出的最佳答案:
df_filtered = df.filter(any(not c.isdigit() for c in df.id))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.