简体   繁体   中英

Spark SQL rlike to find all strings with trailing numbers

While querying from data frame I have tried to use rlike without much success.

Sample data:

column_a|column_b
1|abc xyz
2|123 abc xyz
3|abc 123 xyz
4|abc 123
5|xyz 123

Expected output:

column_a|column_b
4|abc 123
5|xyz 123

I have tried:

select * from table_1 where column_b rlike '\d+$' (select * from table_1 where column_b rlike '/\d+$')

Output (no results):

column_a|column_b

I've also tried:

select * from table_1 where column_b rlike '\d*$' (select * from table_1 where column_b rlike '/\d*$')

Output (all rows):

column_a|column_b
1|abc xyz
2|123 abc xyz
3|abc 123 xyz
4|abc 123
5|xyz 123

Is my regex incorrect? I have tested using python and online tester and it looks correct. Or does rlike support some specific regex?

You'll need a bit more escaping to make it work. In particular:

spark.sql("SELECT 'abc 123' RLIKE '\\\\d+$'").show()
+------------------+
|abc 123 RLIKE \d+$|
+------------------+
|              true|
+------------------+
spark.sql("SELECT '123 abc xyz' RLIKE '\\\\d+$'").show()
+----------------------+
|123 abc xyz RLIKE \d+$|
+----------------------+
|                 false|
+----------------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM