简体   繁体   中英

Python Regular Expression returning nothing to repeat?

So, what I'm trying to do is to parse a text file line by line into a list.

I've successfully done that. Now, I need to pull all the links that end with .html out.

Therefore I thought I would parse each line and if it matched *.html . So I believe the best way to do this is regular expressions. Below is my code and the error in question is the error returned about nothing to repeated. I've bolded the line it refers to.

Code:

compiled = re.compile("*.html") //Error Here
[m.group(0) for l in content for m in [compiled.search(l)] if m]

Just for the record I am trying to extract links that look like:

Nws_NewsDetails.aspx@Site_Id=2&lang=1&NewsID=148513&CatID=19&Type=Home&GType=1.html

But they could truly be random, hence the *.html

In Regular Expression, * is a meta character and it has a special meaning. That is why it gives you the error. You can use the following RegEx,

re.compile(".*\.html")

Here, .* means that any character can occur any number of times(0 or more times) (that is what * actually means in Regular Expression) and then you wanted to match . , so we match it by \\. , as dot also has a special meaning (it matches any character), we need to escape it with \\ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM