如何从 Pyspark Dataframe 中的字符串列中过滤字母值？

Question

I have a string column that I need to filter.我有一个需要过滤的字符串列。 I need to obtain all the values that have letters or special characters in it.我需要获取所有包含字母或特殊字符的值。

Initial column:初始列：

id ID
12345 12345
23456 23456
3940A 3940A
19045 19045
2BB56 2BB56
3(40A 3(40A

Expected output:预期 output：

id ID
3940A 3940A
2BB56 2BB56
3(40A 3(40A

TIA TIA

Answer 1

Just the simple digits regex can solve your problem.只需简单的数字正则表达式就可以解决您的问题。 ^\d+$ would catch all values that is entirely digits. ^\d+$将捕获所有完全是数字的值。

from pyspark.sql import functions as F

df.where(F.regexp_extract('id', '^\d+$', 0) == '').show()

+-----+
|   id|
+-----+
|3940A|
|2BB56|
|3(401|
+-----+

Answer 2

The question was very vague, so here is the best answer that I can give:这个问题非常模糊，所以这是我能给出的最佳答案：

df_filtered = df.filter(any(not c.isdigit() for c in df.id))

如何从 Pyspark Dataframe 中的字符串列中过滤字母值？

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-12-02 20:07:53

解决方案2
0 2021-12-02 19:56:38

如何从 Pyspark Dataframe 中的字符串列中过滤字母值？

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-12-02 20:07:53

解决方案2 0 2021-12-02 19:56:38

解决方案1
1 已采纳 2021-12-02 20:07:53

解决方案2
0 2021-12-02 19:56:38