简体   繁体   中英

Regex - find specific sequence of characters, some are not letters, digits or underscore

I am new to regex and need to search a string field in Impala for multiple matches to this exact sequence of characters: ~FC*

since the ~ and * are not letters or digits, I am unsure on how to search for these in this specific order and not just for any of these single characters occuring.

This is my code so far, have tried both of these [~FC*] or ^~FC*$

This is a test string, it has 2 occurrences:

N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~
.*(~FC\*).* or .*(\~FC\*).*

.* - Zero or more characters
.*(~FC\*).* - Means Search for ~FC* 

    

if 1st one does not work, please try second one, it might work if tilde symbol is reserved in regex.

You can use a simple SQL like below. This will work only on hardcoded string.

select (length(mycol)- length (replace(mycol,'~FC*','')))/length('~FC*') as occurance_str

Here is the SQL i tested ok

select 
(length('N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~')
- length(replace('N4*CITY*STATE*2155446*2120~FC*C*IND*30*MC*blah blah fjdgfeufh*27*0*****Y~FC*Z*IND*39*MC*jhlkfhfudfgsdkufgkusgfn*23*0*****Y~','~FC*',''))
)/length('~FC*') as occurance_str

About the patterns that you tried:

  • This pattern [~FC*] matches a single character being one of ~ F C *

  • This pattern ^~FC*$ has anchors ^ and $ to assert the start and the end of the string, and in between it matches ~F followed by optional repetitions of a C char

If you want to find the 2 occurrences, you can use this pattern escaping the asterix:

~FC\*

See a regex demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM