简体   繁体   English

Python正则表达式不返回任何重复内容吗?

[英]Python Regular Expression returning nothing to repeat?

So, what I'm trying to do is to parse a text file line by line into a list. 因此,我要尝试的是将文本文件逐行解析为列表。

I've successfully done that. 我已经成功做到了。 Now, I need to pull all the links that end with .html out. 现在,我需要拉出所有以.html结尾的链接。

Therefore I thought I would parse each line and if it matched *.html . 因此,我认为我将解析每行,如果它匹配*.html So I believe the best way to do this is regular expressions. 因此,我相信做到这一点的最佳方法是正则表达式。 Below is my code and the error in question is the error returned about nothing to repeated. 下面是我的代码,有问题的错误是返回的错误几乎没有重复发生。 I've bolded the line it refers to. 我已经将其所指的行加粗了。

Code: 码:

compiled = re.compile("*.html") //Error Here
[m.group(0) for l in content for m in [compiled.search(l)] if m]

Just for the record I am trying to extract links that look like: 仅出于记录目的,我试图提取如下所示的链接:

Nws_NewsDetails.aspx@Site_Id=2&lang=1&NewsID=148513&CatID=19&Type=Home&GType=1.html

But they could truly be random, hence the *.html 但它们实际上可能是随机的,因此* .html

In Regular Expression, * is a meta character and it has a special meaning. 在正则表达式中, *是元字符,它具有特殊含义。 That is why it gives you the error. 这就是为什么它会给您错误。 You can use the following RegEx, 您可以使用以下RegEx,

re.compile(".*\.html")

Here, .* means that any character can occur any number of times(0 or more times) (that is what * actually means in Regular Expression) and then you wanted to match . 在这里, .*表示任何字符都可以出现任意次(0次或多次)(这是*在正则表达式中实际上意味着的意思),然后您想匹配. , so we match it by \\. ,因此我们用\\.匹配\\. , as dot also has a special meaning (it matches any character), we need to escape it with \\ . ,因为点也具有特殊含义(它可以匹配任何字符),因此我们需要使用\\对其进行转义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM