[英]REGEXP_REPLACE for spark.sql()
我需要為 spark.sql() 作業編寫 REGEXP_REPLACE 查詢。 如果值遵循以下模式,則僅提取第一個連字符之前的單詞並將其分配給目標列“名稱”,但如果模式不匹配,則應報告整個“名稱”。
圖案:
例如:
如果名稱 = abc45-dsg5-gfdvh6-9890-7685 , output 的REGEXP_REPLACE = abc45
如果名稱 = abc , output 的REGEXP_REPLACE = abc
如果名稱 = abc-gf5-dfg5-asd5-98-00 , output 的REGEXP_REPLACE = abc-gf5-dfg5-asd5-98-00
我有
spark.sql("SELECT REGEXP_REPLACE(name , '-[^-]+-\\w{2}-\\d+-\\d+$','',1,1,'i') AS name").show();
但它不起作用。
利用
^([^-]*)(-[a-zA-Z0-9]+){2}-[0-9]+-[0-9]+$
見證明。 替換為$1
。 如果$1
不起作用,請使用\1
。 如果\1
不起作用,請使用\\1
。
解釋
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^-]* any character except: '-' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2 (2 times):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
){2} end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.