简体   繁体   中英

Remove leading zeros from python complex executable string

I am working with Grammatical Evolution (GE) on Python 3.7. My grammar generates executable strings in the format:

np.where(<variable> <comparison_sign> <constant>, (<probability1>), (<probability2>))

Yet, the string can get quite complex, with several chained np.where .

<constant> in some cases contains leading zeros, which makes the executable string to generate errors. GE is supposed to generate expressions containing leading zeros , however, I have to detect and remove them. An example of a possible solution containing leading zeros:

"np.where(x < 02, np.where(x > 01.5025, (0.9), (0.5)), (1))"

Problem:

  • There are two types of numbers containing leading zeros: int and float.
  • Supposing that I detect "02" in the string. If I replace all occurrences in the string from "02" to "2", the float "01.5025" will also be changed to "01.525", which cannot happen.

I've made several attempts with different re patterns, but couldn't solve it. To detect that an executable string contains leading zeros, I use:

try:
  _ = eval(expression)
except SyntaxError:
  new_expression = fix_expressions(expression)

I need help building the fix_expressions Python function.

You could try to come up with a regular expression for numbers with leading zeros and then replace the leading zeros.

import re

def remove_leading_zeros(string):
    return re.sub(r'([^\.^\d])0+(\d)', r'\1\2', string)

print(remove_leading_zeros("np.where(x < 02, np.where(x > 01.5025, (0.9), (0.5)), (1))"))

# output: np.where(x < 2, np.where(x > 1.5025, (0.9), (0.5)), (1))

The remove_leading_zeros function basically finds all occurrences of [^\.^\d]0+\d and removes the zeros. [^\.^\d]0+\d translates to not a number nor a dot followed by at least one zero followed by a number. The brackets ( , ) in the regex signalize capture groups , which are used to preserve the character before the leading zeros and the number after.


Regarding Csaba Toth 's comment:

The problem with 02+03*04 is that there is a zero at the beginning of the string. One can modify the regex such that it matches also the beginning of the string in the first capture group:

r"(^|[^\.^\d])0+(\d)"

You can remove leading 0's in a string using .lstrip()

str_num = "02.02025"

print("Initial string: %s \n" % str_num)

str_num = str_num.lstrip("0")

print("Removing leading 0's with lstrip(): %s" % str_num)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM