[英]Python regex replace substrings inside strings
I have a string like this:我有一个这样的字符串:
import re
text = """
Some stuff to keep <b>here</b>
CODE
<b>Replace gt and lt</b>
<i>inside <script>this</script> code</i>
CODE
Some more stuff to keep <b>here</b>
"""
And the expected output is:而预期的 output 是:
Some stuff to keep <b>here</b>
CODE
_LT_b_GT_Replace gt and lt_LT_/b_GT_
_LT_i_GT_inside _LT_script_GT_this_LT_/script_GT_ code_LT_/i_GT_
CODE
Some more stuff to keep <b>here</b>
Here's a small subset of what I've tried:这是我尝试过的一小部分:
# None of these work, and typically only replace the first or last occurence of <
re.sub(r'(?<=CODE)<(?=CODE)', r'_LT_', text, flags=re.DOTALL)
re.sub(r'(?<=CODE)(.*?)<(.*?)(?=CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
re.sub(r'(?<=CODE)(.*?)[<]*(.*?)(?=CODE)', r'\1_LT_\2', text, flags=re.DOTALL|re.MULTILINE)
re.sub(r'(CODE.*?)<(.*?CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
re.sub(r'(CODE.*)<(.*CODE)', r'\1_LT_\2', text, flags=re.DOTALL)
What I'd like to happen: All occurrences of <
between CODE
and CODE
to be replaced with _LT_
.我想要发生的事情:
CODE
和CODE
之间出现的所有<
都将替换为_LT_
。
After spending the day on stackoverflow and regex101.com, I'm starting to think either it's not possible or I'm not smart enough to handle this.在 stackoverflow 和 regex101.com 上度过了一天之后,我开始认为这是不可能的,或者我不够聪明来处理这个问题。
Any help is tremendously appreciated!非常感谢任何帮助!
Thanks in advance.提前致谢。
Here is my answer:这是我的答案:
text = """
Some stuff to keep <b>here</b>
CODE
<b>Replace gt and lt</b>
<i>inside <script>this</script> code</i>
CODE
Some more stuff to keep <b>here</b>
"""
output = ''
for i in range(len(text.split('CODE'))):
if i % 2:
output += text.split('CODE')[i].replace('>', '_GT_').replace('<', '_LT_')
else:
output += text.split('CODE')[i]
print(output)
With this solution, every code block is being formated and added to the output
.使用此解决方案,每个代码块都被格式化并添加到
output
中。 This does not include regex
but this works.这不包括
regex
,但这有效。
With regex:使用正则表达式:
import re
text = "\nSome stuff to keep <b>here</b>\n\nCODE\n<b>Replace gt and lt</b>\n<i>inside <script>this</script> code</i>\nCODE\n\nSome more stuff to keep <b>here</b>\n"
pattern = r"(?s)CODE.*?CODE"
print(re.sub(pattern, lambda x: x.group().replace('<','_LT_').replace('>','_GT_'), text))
See Python proof .参见Python 证明。
Results :结果:
Some stuff to keep <b>here</b>
CODE
_LT_b_GT_Replace gt and lt_LT_/b_GT_
_LT_i_GT_inside _LT_script_GT_this_LT_/script_GT_ code_LT_/i_GT_
CODE
Some more stuff to keep <b>here</b>
See regex proof .请参阅正则表达式证明。
EXPLANATION解释
--------------------------------------------------------------------------------
(?s) set flags for this block (with . matching
\n) (case-sensitive) (with ^ and $
matching normally) (matching whitespace
and # normally)
--------------------------------------------------------------------------------
CODE 'CODE'
--------------------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
--------------------------------------------------------------------------------
CODE 'CODE'
I'll update this answer in a few minutes with an only-regex solution but, meanwhile... Is not doing a split and then join strings a solution?我将在几分钟内使用唯一的正则表达式解决方案更新此答案,但同时......不是进行拆分然后加入字符串解决方案吗?
re.sub(regex, value, text.split("CODE\n")[1], flags)
EDIT, I found the answer: but it's a little bit hacky You can read the full description in this post: https://stackoverflow.com/a/11096811/8665327编辑,我找到了答案:但它有点hacky你可以阅读这篇文章的完整描述: https://stackoverflow.com/a/11096811/8665327
Basically, the line you are looking for is this:基本上,您正在寻找的行是这样的:
text = re.sub('\nCODE\n[^(CODE)]*\nCODE\n', lambda x: x.group(0).replace('<', '_LT_').replace('>', '_GT_'), text)
This will work with the first set of text placed between "CODE" text in its own line as long as there is no "CODE" string between them这将适用于放置在“CODE”文本之间的第一组文本,只要它们之间没有“CODE”字符串
will_work = """
<title>This will work</title>
CODE
<b>Replace this</b>
CODE
"""
wont_work = """
CODE
<b>This won't work</b>CODE
CODE
"""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.