简体   繁体   中英

python. re.findall and re.sub with '^'

I try to change string like s='2.3^2+3^3-√0.04*2+√4', where 2.3^2 has to change to pow(2.3,2), 3^3 - pow(3,3), √0.04 - sqrt(0.04) and √4 - sqrt(4).

s='2.3^2+3^3-√0.04*2+√4'
patt1='[0-9]+\.[0-9]+\^[0-9]+|[0-9]+\^[0-9]'
patt2='√[0-9]+\.[0-9]+|√[0-9]+'
idx1=re.findall(patt1, s)
idx2=re.findall(patt2, s)
idx11=[]
idx22=[]
for i in range(len(idx1)):
    idx11.append('pow('+idx1[i][:idx1[i].find('^')]+','+idx1[i][idx1[i].find('^')+1:]+')')

for i in range(len(idx2)):
    idx22.append('sqrt('+idx2[i][idx2[i].find('√')+1:]+')')

for i in range(len(idx11)):
    s=re.sub(idx1[i], idx11[i], s)

for i in range(len(idx22)):
    s=re.sub(idx2[i], idx22[i], s)

print(s)

Temp results:

idx1=['2.3^2', '3^3'] idx2=['√0.04', '√4'] idx11=['pow(2.3,2)', 'pow(3,3)'] idx22=['sqrt(0.04)', 'sqrt(4)']

but string result:

2.3^2+3^3-sqrt(0.04)*2+sqrt(4)

Why calculating 'idx1' is right, but re.sub don't insert this value into string ? (sorry for my english:)

Try this using only re.sub()

Input string:

s='2.3^2+3^3-√0.04*2+√4'

Replacing for pow()

s = re.sub("(\d+(?:\.\d+)?)\^(\d+)", "pow(\\1,\\2)", s)

Replacing for sqrt()

s = re.sub("√(\d+(?:\.\d+)?)", "sqrt(\\1)", s)

Output:

pow(2.3,2)+pow(3,3)-sqrt(0.04)*2+sqrt(4)

() means group capture and \\\\1 means first captured group from regex match. Using this link you can get the detail explanation for the regex.

I've only got python 2.7.5 but this works for me, using str.replace rather than re.sub . Once you've gone to the effort of finding the matches and constructing their replacements, this is a simple find and replace job:

for i in range(len(idx11)):
    s = s.replace(idx1[i], idx11[i])

for i in range(len(idx22)):
    s = s.replace(idx2[i], idx22[i])

edit

I think you're going about this in quite a long-winded way. You can use re.sub in one go to make these changes:

s = re.sub('(\d+(\.\d+)?)\^(\d+)', r'pow(\1,\3)', s)

Will substitute 2.3^2+3^3 for pow(2.3,2)+pow(3,3) and:

s = re.sub('√(\d+(\.\d+)?)', r'sqrt(\1)', s)

Will substitute √0.04*2+√4 to sqrt(0.04)*2+sqrt(4)

There's a few things going on here that are different. Firstly, \\d , which matches a digit, the same as [0-9] . Secondly, the ( ) capture whatever is inside them. In the replacement, you can refer to these captured groups by the order in which they appear. In the pow example I'm using the first and third group that I have captured.

The prefix r before the replacement string means that the string is to be treated as "raw", so characters are interpreted literally. The groups are accessed by \\1 , \\2 etc. but because the backslash \\ is an escape character, I would have to escape it each time ( \\\\1 , \\\\2 , etc.) without the r .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM