简体   繁体   中英

Check if a variable substring is in a string

I receive an input string having values expressed in two possible formats. Eg:

#short format
data = '"interval":19'

>>> "interval":19


#extended format
data = '"interval":{"t0":19,"tf":19}'

>>> "interval":{"t0":19,"tf":19}

I would like to check whether a short format is used and, in case, make it extended.

Considering that the string could be composed of multiple values, ie

data = '"interval":19,"interval2":{"t0":10,"tf":15}'

>>> "interval":19,"interval2":{"t0":10,"tf":15}

I cannot just say:

if ":{" not in data:
    #then short format is used

I would like to code something like:

if ":$(a general int/float/double number)" in data:
    #extract the number
    #replace ":{number}" with the extended format

I know how to code the replacing part. I need help for implementing if condition: in my mind, I model it like a variable substring, in which the variable part is the number inside it, while the rigid format is the $(value name) + ":" part.

  "some_value":19
       ^       ^
 rigid format  variable part

EDIT - WHY NOT PARSE IT?

I know the string is "JSON-friendly" and I can convert it into a dictionary, easily accessing then the values.

Indeed, I already have this solution in my code. But I don't like it since the input string could be multilevel and I need to iterate on the leaf values of the resulting dictionary, independently from the dictionary levels. The latter is not a simple thing to do.

So I was wondering whether a way to act directly on the string exists.

If you replace all keys, except t0 , tf , followed by numbers, it should work.
I show you an example on a multilevel string, probably to be put in a better shape:

import re

s = '"interval": 19,"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval":23}}'

gex = '("(?!(t0|tf)")\w+":)\s*(\d+)'
new_s = re.sub(gex, r'\1 {"t0": \3, "tf": \3}', s)
print(new_s)
>>> print(new_s)
"interval": {"t0": 19, "tf": 19},"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval": {"t0": 23, "tf": 23}}}

You could use a regular expression. ("interval":)(\\d+) will look for the string '"interval":' followed by any number of digits.

Let's test this

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'xxx', data)
print(result)
# -> xxx,"interval2":{"t0":10,"tf":15},xxx

We see that we found the correct places. Now we're going to create your target format. Here the matched groups come in handy. In the regular expression ("interval":) is group 1, (\\d+) is group 2.

Now we use the content of those groups to create your wanted result.

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"interval":{"t0":25,"tf":25}

If there are floating point values involved you'll have to change (\\d+) to ([.\\d]+) .

If you want any Unicode standard word characters and not only interval you can use the special sequence \\w and because it could be multiple characters the expression will be \\w+ .

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'("\w+":)([.\d]+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":{"t0":10,"tf":10},"tf":{"t0":15,"tf":15}},"Monty":{"t0":25.4,"tf":25.4}

Dang! Yes, we found "Monty" but now the values from the second part are found too. We'll have to fix this somehow. Let's see. We don't want ("\\w+") if it's preceded by { so were going to use a negative lookbehind assertion : (?<!{)("\\w+") . And after the number part (\\d+) we don't want a } or an other digit so we're using a negative lookahead assertion here: ([.\\d]+)(?!})(?!\\d) .

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'(?<!{)("\w+":)([.\d]+)(?!})(?!\d)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"Monty":{"t0":25.4,"tf":25.4}

Hooray, it works!

Regular expressions are powerful and fun, but if you start to add more constraints this might become unmanageable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM