简体   繁体   English

如何在python中使用正则表达式修改字符串中的文本?

[英]How to modify a text within string using regular expressions in python?

I'm trying to change a file containing strings like: 我正在尝试更改包含如下字符串的文件:

Record 1 : 
{ "K1":"value1" , 
  "K2":"value2" 
}

Record 2 :
{ "K1":"value3" , 
  "K2":"value4" 
}

to

{
    "Record_1" : 
        { "K1": "value1", 
          "K2": "value2" 
    }, 

    "Record_2" :
        { "K1": "value3", 
          "K2": "value4" 
        }
}

(to make it into a correct JSON format). (以使其成为正确的JSON格式)。
The part of code that I'm having problems with is: 我遇到问题的部分代码是:

pattern = r"(\s*)Record (\d+):"
all_records_json = re.sub(middle_pattern, "\"Record_"+ ??? + "\" : ",all_records)

And I don't know what to put instead of ??? 而且我不知道要放什么而不是??? so that it reads the (\\d+) part that matched the pattern. 以便它读取与模式匹配的(\\ d +)部分。

First, your pattern doesn't actually match your data in the first place: 首先,您的模式首先与实际数据不匹配:

>>> all_records = '''Record 2 :
... { "K1":"value3" , 
...   "K2":"value4" 
... }'''
>>> pattern = r"(\s*)Record (\d+):"
>>> re.findall(pattern, all_records)
[]

That's because your data has a space between the digits and the colon. 那是因为您的数据在数字和冒号之间有一个空格。 You need to fix that. 您需要解决此问题。 While we're at it, I have no idea why you're putting a group around the preceding whitespace, so let's not do that. 当我们讨论它时,我不知道为什么要在前面的空格周围放置一个组,所以我们不要这样做。 So we get: 这样我们得到:

>>> pattern = r"\s*Record (\d+)\s*:"
>>> re.findall(pattern, all_records)
[2]

Now, your only capturing group is the \\d+ . 现在,您唯一的捕获组是\\d+ So that will be group 1. Which you can include in the substitution as \\1 . 因此,这将是第1组。您可以将其包括在\\1 So: 所以:

>>> print(re.sub(pattern, r'"Record_\1": ', all_records))
"Record_2":
{ "K1":"value3" ,
  "K2":"value4"
}

That still isn't valid JSON, but it's what you wanted, right? 那仍然不是有效的JSON,但这正是您想要的,对吗?

If you read the docs, re.sub explains that "Backreferences, such as \\6 , are replaced with the substring matched by group 6 in the pattern." 如果您阅读文档,则re.sub解释说:“后向引用,例如\\6 ,将替换为模式中第6组匹配的子字符串。” For full details on back references for both (…) groups and (?P<name>…) groups, look them up in the Regular Expression Syntax . 有关(…)组和(?P<name>…)组的反向引用的完整详细信息,请在正则表达式语法中查找它们。 You should also read the Regular Expression HOWTO , which explains all of this in a more novice-friendly way. 您还应该阅读Regular Expression HOWTO ,它以一种对新手更友好的方式解释了所有这些内容。

Notice that I made the substitution a raw string just like the pattern, so I didn't have to escape the \\1 , and I also used single instead of double quotes, so I didn't have to escape the " characters. 注意,就像模式一样,我将替换设置为原始字符串,因此我不必转义\\1 ,并且我也使用单引号而不是双引号,因此不必转义"字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM