[英]Imapala Regex - find specific sequence of characters, with delimiters between them, some are not letters, digits or underscore
I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC*
followed by 11 more *
that could have letters/digits between (but could not, they are basically delimiters in this string field).我是正则表达式的新手,需要在 Impala 中的字符串字段中搜索与该确切字符序列的多个匹配项:
~FC*
后跟 11 个以上*
之间可能有字母/数字(但不能,它们基本上是分隔符)字符串字段)。 After the 12th *
(if you count #1 in ~FC*
) it should be immediately followed by Y~
.在第 12 个
*
之后(如果您在~FC*
中计算 #1),它应该紧跟Y~
。
since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.由于星号不是字母或数字,我不确定如何正确搜索这些分隔符。
This is my SQL so far:到目前为止,这是我的 SQL:
select
regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
data returned:返回的数据:
pattern_found
--------------
~FC*
(~FC\\*)
in Impala SQL it returns ~FC*
which is great (got it from my other question ) (~FC\\*)
在 Impala SQL 它返回~FC*
这很棒(从我的另一个问题中得到)
Been trying this (~FC\\*).*(\\*Y~)
which obviously isnt counting the number of asterisks but its is also not picking the Y up.一直在尝试这个
(~FC\\*).*(\\*Y~)
这显然没有计算星号的数量,但它也没有选择 Y。
This is a test string, it has 2 occurrences:这是一个测试字符串,它出现了 2 次:
N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
results should be these 2, which has an overlapping ~
between them.结果应该是这两个,它们之间有重叠
~
。 but will settle for at least the first being found if both cannot.但如果两者都不能,至少会满足于第一个被发现。
~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~
~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~
~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
figured out a solution but happy to learn of a better way to accomplish this想出了一个解决方案,但很高兴知道一个更好的方法来完成这个
This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:这在 Impala SQL 中有效,所有星号都需要括号和双转义反斜杠:
(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)
Full SQL:完整 SQL:
select
regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1
and here is the RegexDemo without the additional syntax needed for Impala SQL这是RegexDemo ,没有 Impala SQL 所需的额外语法
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.