简体   繁体   中英

get all occurences from a string based on find pattern in python

Suppose I have a string like this:

exp = 'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END' 
exp2 = 'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '

I want to return the all occurrences of WHEN and THEN along with the texts of that.

This is the expected output of exp1

['WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'','WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\'']

This is the expected output of exp2

['WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']

What I have tried is this:

res = re.findall(r'\s*(WHEN|When|when)+\s*(.*)\s*(THEN|Then|then)+\s*')

But the resulting list shows this output in my case

['(WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN)']

Try:

WHEN (?:(?! +(?:WHEN|ELSE)).)* # with flags=re.I
  1. WHEN - Matches 'WHEN '
  2. (?:(?? +(:.WHEN|ELSE)).) - Uses negative lookahead and states that as long as the current position does not match one or more space characters followed by either 'WHEN' or 'ELSE', then match one more character.

See Regex Demo

import re

cases = [
    'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END',
    'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '
]

for case in cases:
    res = re.findall(r'WHEN (?:(?! +(?:WHEN|ELSE)).)*', case, flags=re.I)
    print(res)

Prints:

['WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'', 'WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\'']
['WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']

Update

If you want to group the WHEN and ELSE parts (stripped of leading and trailing spaces), then use the following regex:

WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)
import re

cases = [
    'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END',
    'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '
]

for case in cases:
    results = re.findall(r'WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)', case, flags=re.I)
    for result in results:
        print(result[0], result[1])

Prints:

"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='CPU' 'YES'
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='RAM' 'YES'
("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM