Imapala Regex - 查找特定的字符序列，它们之间有分隔符，有些不是字母、数字或下划线

Question

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC* followed by 11 more * that could have letters/digits between (but could not, they are basically delimiters in this string field).我是正则表达式的新手，需要在 Impala 中的字符串字段中搜索与该确切字符序列的多个匹配项： ~FC*后跟 11 个以上*之间可能有字母/数字（但不能，它们基本上是分隔符）字符串字段）。 After the 12th * (if you count #1 in ~FC* ) it should be immediately followed by Y~ .在第 12 个*之后（如果您在~FC*中计算 #1），它应该紧跟Y~ 。

since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.由于星号不是字母或数字，我不确定如何正确搜索这些分隔符。

This is my SQL so far:到目前为止，这是我的 SQL：

select 
    regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1

data returned:返回的数据：

pattern_found
--------------
~FC*

(~FC\\*) in Impala SQL it returns ~FC* which is great (got it from my other question ) (~FC\\*)在 Impala SQL 它返回~FC*这很棒（从我的另一个问题中得到）

Been trying this (~FC\\*).*(\\*Y~) which obviously isnt counting the number of asterisks but its is also not picking the Y up.一直在尝试这个(~FC\\*).*(\\*Y~)这显然没有计算星号的数量，但它也没有选择 Y。

This is a test string, it has 2 occurrences:这是一个测试字符串，它出现了 2 次：

N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~

results should be these 2, which has an overlapping ~ between them.结果应该是这两个，它们之间有重叠~ 。 but will settle for at least the first being found if both cannot.但如果两者都不能，至少会满足于第一个被发现。

~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~ ~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~ ~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~ ~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~

Answer 1

figured out a solution but happy to learn of a better way to accomplish this想出了一个解决方案，但很高兴知道一个更好的方法来完成这个

This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:这在 Impala SQL 中有效，所有星号都需要括号和双转义反斜杠：

(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)

Full SQL:完整 SQL：

select 
    regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1

and here is the RegexDemo without the additional syntax needed for Impala SQL这是RegexDemo ，没有 Impala SQL 所需的额外语法

Imapala Regex - 查找特定的字符序列，它们之间有分隔符，有些不是字母、数字或下划线

问题描述

1 个解决方案

解决方案1
0 2022-09-27 19:59:43

Imapala Regex - 查找特定的字符序列，它们之间有分隔符，有些不是字母、数字或下划线

问题描述

1 个解决方案

解决方案1 0 2022-09-27 19:59:43

解决方案1
0 2022-09-27 19:59:43