简体   繁体   English

Imapala Regex - 查找特定的字符序列,它们之间有分隔符,有些不是字母、数字或下划线

[英]Imapala Regex - find specific sequence of characters, with delimiters between them, some are not letters, digits or underscore

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC* followed by 11 more * that could have letters/digits between (but could not, they are basically delimiters in this string field).我是正则表达式的新手,需要在 Impala 中的字符串字段中搜索与该确切字符序列的多个匹配项: ~FC*后跟 11 个以上*之间可能有字母/数字(但不能,它们基本上是分隔符)字符串字段)。 After the 12th * (if you count #1 in ~FC* ) it should be immediately followed by Y~ .在第 12 个*之后(如果您在~FC*中计算 #1),它应该紧跟Y~

since the asterisks are not letters or digits, I am unsure on how to search for these delimiters properly.由于星号不是字母或数字,我不确定如何正确搜索这些分隔符。

This is my SQL so far:到目前为止,这是我的 SQL:

select 
    regexp_extract(col_name, '(~FC\\*).*(\\*Y~)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1

data returned:返回的数据:

pattern_found
--------------
~FC*

(~FC\\*) in Impala SQL it returns ~FC* which is great (got it from my other question ) (~FC\\*)在 Impala SQL 它返回~FC*这很棒(从我的另一个问题中得到)

Been trying this (~FC\\*).*(\\*Y~) which obviously isnt counting the number of asterisks but its is also not picking the Y up.一直在尝试这个(~FC\\*).*(\\*Y~)这显然没有计算星号的数量,但它也没有选择 Y。

This is a test string, it has 2 occurrences:这是一个测试字符串,它出现了 2 次:

N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~

results should be these 2, which has an overlapping ~ between them.结果应该是这两个,它们之间有重叠~ but will settle for at least the first being found if both cannot.但如果两者都不能,至少会满足于第一个被发现。

~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~ ~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~ ~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~ ~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~

figured out a solution but happy to learn of a better way to accomplish this想出了一个解决方案,但很高兴知道一个更好的方法来完成这个

This is what worked in Impala SQL, needed parentheses and double escape backslashes for allllll the asterisks:这在 Impala SQL 中有效,所有星号都需要括号和双转义反斜杠:

(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)

Full SQL:完整 SQL:

select 
    regexp_extract(col_name, '(~FC\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*[^\\*]*\\*Y)', 1) as "pattern_found"
from db.table
where id = 123456789
limit 1

and here is the RegexDemo without the additional syntax needed for Impala SQL这是RegexDemo ,没有 Impala SQL 所需的额外语法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式 - 查找特定的字符序列,有些不是字母、数字或下划线 - Regex - find specific sequence of characters, some are not letters, digits or underscore 如何存储数字序列(或其他字符集)? - How to store a sequence of digits (or sequence of characters from some other set)? 如何使用Regex查找两个特殊字符之间的特定字符串? - How to use Regex to find a specific string between two special characters? 正则表达式-精确匹配10位数字,其中至少有一个符号或空格 - regex - match exactly 10 digits with atleast one symbol or spaces between them SQL Oracle / Varchar2 中的多序列 ID 带字母和数字 - Multiple Sequence in SQL Oracle / Varchar2 ID with letters and digits 使用正则表达式查找单引号字符串以及它们之间的含义 - Using regex to find single quoted strings and what is in between them 如何使用正则表达式查找缺少特定字符的值 - How do I find values that are missing specific characters with regex Oracle 替换一些重复的字符(非数字) - Oracle replace some duplicated characters (non digits ) 删除除数字之间的前导和尾随字符 - Remove leading and trailing characters except between the digits 查找除下划线和空格外包含特殊字符的行 - Find rows which contain special characters except underscore and space
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM