简体   繁体   中英

Regular expression to match word followed by a pair of brackets

I have a string which is:

str2s = 'orange,juices,apple,apple[-2]'

I'm trying to extract all those words as long as the bracket out, using regex, not to use str.split(), so I want:

'orange', 'juices', 'apple', 'apple[-2]'

I tried using:

re.findall(
    '[A-Za-z][A-Za-z0-9_%\\.]{0,}\[?[a-zA-Z0-9_]*\]?',
    str2s,
    flags=re.IGNORECASE
)

But it only returned:

'orange', 'juices', 'apple', 'apple['

How to get the -2] as well?

You can start with match with a char a-zA-Z, then match optional word characters, and optionally match from an opening till closing square bracket.

\b[A-Z]\w*(?:\[[^][]*\])?

Explanation

  • \b A word boundary to prevent a partial word match
  • [AZ]\w* Match a char a-zA-Z followed by optional word characters
  • (?: Non capture group
  • )? Close the non capture group and make it optional

See a regex demo and a Python demo .

import re

str2s = 'orange,juices,apple,apple[-2]'
print(re.findall(r'\b[A-Z]\w*(?:\[[^][]*\])?', str2s, flags=re.I))

Output

['orange', 'juices', 'apple', 'apple[-2]']

In order to split an string to a list I think you have to know the exact separator, or being able to identify those separators, being it , or [,.] or others.

If you can't define separators from items, I think it will be very hard to achieve your goal via common methods.

With that being said, in your case of orange,juices,apple,apple[-2] , you may use r'([\w\[\]\-]+)' https://regex101.com/r/2Xj7AR/1

The following code will extract the words the way you want:

import re

words = re.compile(r'\w+\[?-?\d*\]?', re.IGNORECASE)
s = 'orange,juices,apple,apple[-2],pineapple[20]'

words.findall(s)

Which will result in the following:

['orange', 'juices', 'apple', 'apple[-2]', 'pineapple[20]']

Bear in mind that the snippet above was written with the example string you provided as the base. If you need to match other types of words ( 007%abc , for example), you will need to adjust the regular expression to match more characters.

SELECT RESULT, C1 ||','|| C2 ||','|| C3 || ','|| C4 AS  R_W_APOSTROPHE  
      FROM (        
        
                  
SELECT RESULT,
       REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,1),'[^09+,].*'),'^|$','''') AS C1,
            REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,2),'[^09+,].*'),'^|$','''') AS C2,
            REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,3),'[^09+,].*'),'^|$','''') AS C3,
            REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,4),'[^09+,].*'),'^|$','''') AS C4  
from (
select 'orange, juices, apple, apple[-2]' as RESULT from dual))

This code is written on PL/SQL but you can use the algorithm. Maybe can make things on your mind.

  • Firstly split to column beginning and ending space with apostrophe
  • Then concatenate all columns with comma

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM