简体   繁体   English

Python:如何使用正则表达式拆分此字符串?

[英]python: how to split this string with a regex?

Simple one here but I'm fairly new to Python. 这里很简单,但是我对Python还是很陌生。

I have a string like this: 我有一个像这样的字符串:

this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page
// newlines added for readability

I need to split the string using this regex: <!--pagebreak(*.?)--> - the idea is that sometimes the <!--pagebreak--> comments have a 'title' (which I use in my templates), other times they don't. 我需要使用此正则表达式拆分字符串: <!--pagebreak(*.?)--> -想法是有时<!--pagebreak-->注释带有一个“标题”(我在我的模板),其他时候则没有。

I tried this: 我尝试了这个:

re.split("<!--pagebreak*.?-->", str)

which returned only the items with 'titles' in the pagebreak (and didn't split them correctly either). 它仅返回分页符中带有“标题”的项目(也没有正确拆分它们)。 What am I doing wrong here? 我在这里做错了什么?

Change *.? 更改*.? into .*? .*? :

re.split("<!--pagebreak.*?-->", str)

Your current regex accepts any number of literal k 's, optionally followed by (any character). 您当前的正则表达式接受任意数量的文字k ,可以选择后面跟着(任何字符)。

Also, I would recommend using raw strings ( r"..." ) for your regular expressions. 另外,我建议将原始字符串( r"..." )用于正则表达式。 It's not necessary in this case, but it's an easy way to spare yourself a few headaches. 在这种情况下这不是必需的,但这是一种让自己省去一些麻烦的简单方法。

You swapped the . 您换了. with the * . * The correct regex is: 正确的正则表达式为:

<!--pagebreak.*?-->

Definitely an issue of swapping the . 绝对是交换货币的问题。 and *. 和*。 "." “。” matches all and the asterisk indicates that you'll take as many characters as you can get (limited of course by the non-greedy qualifier "?") 全部匹配,并且星号表示您将获取尽可能多的字符(当然,受非贪婪的限定词“?”的限制)

import re

s = """this is page one of an article 
<!--pagebreak page two --> this is page two 
<!--pagebreak--> this is the third page 
<!--pagebreak page four --> last page"""

print re.split(r'<!--pagebreak.*?-->', s)

Outputs: 输出:

['this is page one of an article \\n', ' this is page two \\n', ' this is the third page \\n', ' last page'] ['这是文章的第一页\\ n','这是第二页\\ n','这是第三页\\ n','最后一页']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM