[英]python: how to split this string with a regex?
Simple one here but I'm fairly new to Python. 这里很简单,但是我对Python还是很陌生。
I have a string like this: 我有一个像这样的字符串:
this is page one of an article
<!--pagebreak page two --> this is page two
<!--pagebreak--> this is the third page
<!--pagebreak page four --> last page
// newlines added for readability
I need to split the string using this regex: <!--pagebreak(*.?)-->
- the idea is that sometimes the <!--pagebreak-->
comments have a 'title' (which I use in my templates), other times they don't. 我需要使用此正则表达式拆分字符串:
<!--pagebreak(*.?)-->
-想法是有时<!--pagebreak-->
注释带有一个“标题”(我在我的模板),其他时候则没有。
I tried this: 我尝试了这个:
re.split("<!--pagebreak*.?-->", str)
which returned only the items with 'titles' in the pagebreak (and didn't split them correctly either). 它仅返回分页符中带有“标题”的项目(也没有正确拆分它们)。 What am I doing wrong here?
我在这里做错了什么?
Change *.?
更改
*.?
into .*?
到
.*?
: :
re.split("<!--pagebreak.*?-->", str)
Your current regex accepts any number of literal k
's, optionally followed by (any character). 您当前的正则表达式接受任意数量的文字
k
,可以选择后面跟着(任何字符)。
Also, I would recommend using raw strings ( r"..."
) for your regular expressions. 另外,我建议将原始字符串(
r"..."
)用于正则表达式。 It's not necessary in this case, but it's an easy way to spare yourself a few headaches. 在这种情况下这不是必需的,但这是一种让自己省去一些麻烦的简单方法。
You swapped the .
您换了
.
with the *
. 与
*
。 The correct regex is: 正确的正则表达式为:
<!--pagebreak.*?-->
Definitely an issue of swapping the . 绝对是交换货币的问题。 and *.
和*。 "."
“。” matches all and the asterisk indicates that you'll take as many characters as you can get (limited of course by the non-greedy qualifier "?")
全部匹配,并且星号表示您将获取尽可能多的字符(当然,受非贪婪的限定词“?”的限制)
import re
s = """this is page one of an article
<!--pagebreak page two --> this is page two
<!--pagebreak--> this is the third page
<!--pagebreak page four --> last page"""
print re.split(r'<!--pagebreak.*?-->', s)
Outputs: 输出:
['this is page one of an article \\n', ' this is page two \\n', ' this is the third page \\n', ' last page'] ['这是文章的第一页\\ n','这是第二页\\ n','这是第三页\\ n','最后一页']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.