简体   繁体   English

检查变量子字符串是否在字符串中

[英]Check if a variable substring is in a string

I receive an input string having values expressed in two possible formats. 我收到一个输入字符串,其值以两种可能的格式表示。 Eg: 例如:

#short format
data = '"interval":19'

>>> "interval":19


#extended format
data = '"interval":{"t0":19,"tf":19}'

>>> "interval":{"t0":19,"tf":19}

I would like to check whether a short format is used and, in case, make it extended. 我想检查是否使用了短格式,以防万一,将其扩展。

Considering that the string could be composed of multiple values, ie 考虑到字符串可以由多个值组成,即

data = '"interval":19,"interval2":{"t0":10,"tf":15}'

>>> "interval":19,"interval2":{"t0":10,"tf":15}

I cannot just say: 我不能只说:

if ":{" not in data:
    #then short format is used

I would like to code something like: 我想编码类似:

if ":$(a general int/float/double number)" in data:
    #extract the number
    #replace ":{number}" with the extended format

I know how to code the replacing part. 我知道如何编写替换部分的代码。 I need help for implementing if condition: in my mind, I model it like a variable substring, in which the variable part is the number inside it, while the rigid format is the $(value name) + ":" part. 我需要实现if条件的帮助:在我看来,我将其建模为变量子字符串,其中变量部分是其中的数字,而刚性格式是$(值名称)+“:”部分。

  "some_value":19
       ^       ^
 rigid format  variable part

EDIT - WHY NOT PARSE IT? 编辑-为什么不解析它?

I know the string is "JSON-friendly" and I can convert it into a dictionary, easily accessing then the values. 我知道该字符串是“ JSON友好的”,我可以将其转换为字典,然后轻松访问值。

Indeed, I already have this solution in my code. 确实,我的代码中已经有了这个解决方案。 But I don't like it since the input string could be multilevel and I need to iterate on the leaf values of the resulting dictionary, independently from the dictionary levels. 但是我不喜欢它,因为输入字符串可能是多级的,并且我需要独立于词典级别对生成的词典的叶值进行迭代。 The latter is not a simple thing to do. 后者不是一件容易的事。

So I was wondering whether a way to act directly on the string exists. 所以我想知道是否存在一种直接作用于字符串的方法。

If you replace all keys, except t0 , tf , followed by numbers, it should work. 如果替换除t0tf之外的所有键,然后替换数字,则它应该起作用。
I show you an example on a multilevel string, probably to be put in a better shape: 我为您展示了一个有关多级字符串的示例,可能将其放在更好的形状中:

import re

s = '"interval": 19,"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval":23}}'

gex = '("(?!(t0|tf)")\w+":)\s*(\d+)'
new_s = re.sub(gex, r'\1 {"t0": \3, "tf": \3}', s)
print(new_s)
>>> print(new_s)
"interval": {"t0": 19, "tf": 19},"t0interval2":{"t0":10,"tf":15},{"deeper": {"other_interval": {"t0": 23, "tf": 23}}}

You could use a regular expression. 您可以使用正则表达式。 ("interval":)(\\d+) will look for the string '"interval":' followed by any number of digits. ("interval":)(\\d+)将查找字符串'"interval":'后跟任意位数。

Let's test this 让我们测试一下

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'xxx', data)
print(result)
# -> xxx,"interval2":{"t0":10,"tf":15},xxx

We see that we found the correct places. 我们看到我们找到了正确的地方。 Now we're going to create your target format. 现在,我们将创建您的目标格式。 Here the matched groups come in handy. 匹配的组在这里派上用场。 In the regular expression ("interval":) is group 1, (\\d+) is group 2. 在正则表达式("interval":)是组1, (\\d+)是组2。

Now we use the content of those groups to create your wanted result. 现在,我们使用这些组的内容来创建您想要的结果。

data = '"interval":19,"interval2":{"t0":10,"tf":15},"interval":25'
result = re.sub(r'("interval":)(\d+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"interval":{"t0":25,"tf":25}

If there are floating point values involved you'll have to change (\\d+) to ([.\\d]+) . 如果涉及浮点值,则必须将(\\d+)更改为([.\\d]+)

If you want any Unicode standard word characters and not only interval you can use the special sequence \\w and because it could be multiple characters the expression will be \\w+ . 如果您需要任何Unicode标准单词字符,而不仅是interval您可以使用特殊序列\\w ,因为它可能是多个字符,所以表达式将是\\w+

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'("\w+":)([.\d]+)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":{"t0":10,"tf":10},"tf":{"t0":15,"tf":15}},"Monty":{"t0":25.4,"tf":25.4}

Dang! 党! Yes, we found "Monty" but now the values from the second part are found too. 是的,我们找到了"Monty"但现在也找到了第二部分的值。 We'll have to fix this somehow. 我们必须以某种方式解决此问题。 Let's see. 让我们来看看。 We don't want ("\\w+") if it's preceded by { so were going to use a negative lookbehind assertion : (?<!{)("\\w+") . 我们不希望("\\w+")前面带有{因此将使用否定的后向断言(?<!{)("\\w+") And after the number part (\\d+) we don't want a } or an other digit so we're using a negative lookahead assertion here: ([.\\d]+)(?!})(?!\\d) . 在数字部分(\\d+)我们不需要}或其他数字,因此我们在此处使用否定的超前断言([.\\d]+)(?!})(?!\\d)

data = '"interval":19,"interval2":{"t0":10,"tf":15},"Monty":25.4'
result = re.sub(r'(?<!{)("\w+":)([.\d]+)(?!})(?!\d)', r'\1{"t0":\2,"tf":\2}', data)
print(result)
# -> "interval":{"t0":19,"tf":19},"interval2":{"t0":10,"tf":15},"Monty":{"t0":25.4,"tf":25.4}

Hooray, it works! 太好了,它有效!

Regular expressions are powerful and fun, but if you start to add more constraints this might become unmanageable. 正则表达式功能强大且有趣,但是如果您开始添加更多约束,则可能变得难以管理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM