Python：如何使用正则表达式拆分此字符串？

Question

Simple one here but I'm fairly new to Python. 这里很简单，但是我对Python还是很陌生。

I have a string like this: 我有一个像这样的字符串：

this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page
// newlines added for readability

I need to split the string using this regex:  - the idea is that sometimes the  comments have a 'title' (which I use in my templates), other times they don't. 我需要使用此正则表达式拆分字符串：  -想法是有时注释带有一个“标题”（我在我的模板），其他时候则没有。

I tried this: 我尝试了这个：

re.split("<!--pagebreak*.?-->", str)

which returned only the items with 'titles' in the pagebreak (and didn't split them correctly either). 它仅返回分页符中带有“标题”的项目（也没有正确拆分它们）。 What am I doing wrong here? 我在这里做错了什么？

Answer 1

Change *.? 更改*.? into .*? 到.*? : ：

re.split("<!--pagebreak.*?-->", str)

Your current regex accepts any number of literal k 's, optionally followed by (any character). 您当前的正则表达式接受任意数量的文字k ，可以选择后面跟着（任何字符）。

Also, I would recommend using raw strings ( r"..." ) for your regular expressions. 另外，我建议将原始字符串（ r"..." ）用于正则表达式。 It's not necessary in this case, but it's an easy way to spare yourself a few headaches. 在这种情况下这不是必需的，但这是一种让自己省去一些麻烦的简单方法。

Answer 2

You swapped the . 您换了. with the * . 与* 。 The correct regex is: 正确的正则表达式为：

<!--pagebreak.*?-->

Answer 3

Definitely an issue of swapping the . 绝对是交换货币的问题。 and *. 和*。 "." “。” matches all and the asterisk indicates that you'll take as many characters as you can get (limited of course by the non-greedy qualifier "?") 全部匹配，并且星号表示您将获取尽可能多的字符（当然，受非贪婪的限定词“？”的限制）

import re

s = """this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page"""

print re.split(r'<!--pagebreak.*?-->', s)

Outputs: 输出：

['this is page one of an article \\n', ' this is page two \\n', ' this is the third page \\n', ' last page'] ['这是文章的第一页\\ n'，'这是第二页\\ n'，'这是第三页\\ n'，'最后一页']

Python：如何使用正则表达式拆分此字符串？

问题描述

3 个解决方案

解决方案1
2 已采纳 2012-10-04 08:41:03

解决方案2
2 2012-10-04 08:41:35

解决方案3
2 2012-10-04 08:52:22

Python：如何使用正则表达式拆分此字符串？

问题描述

3 个解决方案

解决方案1 2 已采纳 2012-10-04 08:41:03

解决方案2 2 2012-10-04 08:41:35

解决方案3 2 2012-10-04 08:52:22

解决方案1
2 已采纳 2012-10-04 08:41:03

解决方案2
2 2012-10-04 08:41:35

解决方案3
2 2012-10-04 08:52:22