简体   繁体   中英

python regex replace all occurances of word that starts with “:” and next character is letter

I have a sql select (sel = "sql select text") in which variables are defined as :var_name

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel
            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
            and ( domain_name in (:domain) or 'All domains' in (:domain) )
            and (technology in (:technology) or 'All' in (:technology))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) 
            or 'All' in (:application) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=:step_type and ...

Variables start with ":" and ends with parenthesis or white space. I have to replace every :var_name with ${var_name}.

At the moment I am using: re.sub(r":(\\w+)", r"${\\1}", sel) which gives:

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10${11}${12}') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application}) 
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...

Everything works well except 2019-01-01 10:11:12 date constant. As there is ":" character in it, the rest is recognized as variable name.

I should replace only if after ":" character next character is letter.

How to do that?

You can use this regex, which uses a positive look ahead to ensure it only selects the variable that are followed by either space or )

:(\w+)(?=[ )\n]|$)

Demo

Check out this Python code,

import re

s = '''select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel
            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)
            and ( domain_name in (:domain) or 'All domains' in (:domain) )
            and (technology in (:technology) or 'All' in (:technology))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) 
            or 'All' in (:application) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=:step_type and ...:DatumOd
:DatumOd'''

print(re.sub(r':(\w+)(?=[ )\n]|$)', r'${\1}',s))

Prints only your intended variables ignoring colon in date,

select  ... from (
select  ... from (
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application})
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...${DatumOd}
${DatumOd}

You could try the following pattern: '\\W:(\\w+)' in order to select something following a colon only if the colon does not follow a word character. It works with that example, but I am unsure whether it is enough for the general requirement.

As per your requirements, you may use

s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s)

See the regex demo .

Details

  • \\B: - a : that is not preceded with a word char (or is at the string start)
  • ([^\\W\\d_]\\w*) - Group 1 ( \\1 in the replacement pattern):
    • [^\\W\\d_] - any letter
    • \\w* - any 0+ letters, digits, underscores.

NOTE: If you want to match ASCII only letters and digits, and you are using Python 3.x, use re.A or re.ASCII flag:

s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s, flags=re.A)

Python demo :

import re
s = "select  ... from (\r\nselect  ... from ( \r\nselect  ... from  table1\r\nwhere session_started between  toDateTime(:DatumOd) and toDateTime(:DatumDo)\r\nand session_id in (select distinct ...  from table2\r\n    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=:channel\r\n            and session_start between  toDateTime(:DatumOd) and toDateTime(:DatumDo)\r\n            and ( domain_name in (:domain) or 'All domains' in (:domain) )\r\n            and (technology in (:technology) or 'All' in (:technology))\r\n            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (:application) \r\n            or 'All' in (:application) )  )\r\norder by session_id desc , execution_id desc, step_started desc, step_id desc)\r\n) where step_type=:step_type and ..."
s = re.sub(r'\B:([^\W\d_]\w*)', r'${\1}', s, flags=re.A)
print(s)

Output:

select  ... from (
select  ... from ( 
select  ... from  table1
where session_started between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
and session_id in (select distinct ...  from table2
    where   session_start>=toDateTime('2019-01-01 10:11:12') and session_module=${channel}
            and session_start between  toDateTime(${DatumOd}) and toDateTime(${DatumDo})
            and ( domain_name in (${domain}) or 'All domains' in (${domain}) )
            and (technology in (${technology}) or 'All' in (${technology}))
            and (CASE when session_principal_role='Self care' then agent_name else session_principal_role end in  (${application}) 
            or 'All' in (${application}) )  )
order by session_id desc , execution_id desc, step_started desc, step_id desc)
) where step_type=${step_type} and ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM