简体   繁体   English

Spark SQL喜欢查找带有尾随数字的所有字符串

[英]Spark SQL rlike to find all strings with trailing numbers

While querying from data frame I have tried to use rlike without much success. 在从数据框架查询时,我试图使用rlike但没有取得多大成功。

Sample data: 样本数据:

column_a|column_b
1|abc xyz
2|123 abc xyz
3|abc 123 xyz
4|abc 123
5|xyz 123

Expected output: 预期产量:

column_a|column_b
4|abc 123
5|xyz 123

I have tried: 我努力了:

select * from table_1 where column_b rlike '\d+$' (select * from table_1 where column_b rlike '/\d+$')

Output (no results): 输出(无结果):

column_a|column_b

I've also tried: 我也尝试过:

select * from table_1 where column_b rlike '\d*$' (select * from table_1 where column_b rlike '/\d*$')

Output (all rows): 输出(所有行):

column_a|column_b
1|abc xyz
2|123 abc xyz
3|abc 123 xyz
4|abc 123
5|xyz 123

Is my regex incorrect? 我的正则表达式是不正确的? I have tested using python and online tester and it looks correct. 我已经使用python和在线测试器进行了测试,看起来是正确的。 Or does rlike support some specific regex? 或者rlike支持一些特定的正则表达式?

You'll need a bit more escaping to make it work. 你需要更多的逃避才能使它工作。 In particular: 尤其是:

spark.sql("SELECT 'abc 123' RLIKE '\\\\d+$'").show()
+------------------+
|abc 123 RLIKE \d+$|
+------------------+
|              true|
+------------------+
spark.sql("SELECT '123 abc xyz' RLIKE '\\\\d+$'").show()
+----------------------+
|123 abc xyz RLIKE \d+$|
+----------------------+
|                 false|
+----------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM