Python Regular Expression returning nothing to repeat?

Question

So, what I'm trying to do is to parse a text file line by line into a list.

I've successfully done that. Now, I need to pull all the links that end with .html out.

Therefore I thought I would parse each line and if it matched *.html . So I believe the best way to do this is regular expressions. Below is my code and the error in question is the error returned about nothing to repeated. I've bolded the line it refers to.

Code:

compiled = re.compile("*.html") //Error Here
[m.group(0) for l in content for m in [compiled.search(l)] if m]

Just for the record I am trying to extract links that look like:

Nws_NewsDetails.aspx@Site_Id=2&lang=1&NewsID=148513&CatID=19&Type=Home&GType=1.html

But they could truly be random, hence the *.html

Answer 1

In Regular Expression, * is a meta character and it has a special meaning. That is why it gives you the error. You can use the following RegEx,

re.compile(".*\.html")

Here, .* means that any character can occur any number of times(0 or more times) (that is what * actually means in Regular Expression) and then you wanted to match . , so we match it by \\. , as dot also has a special meaning (it matches any character), we need to escape it with \\ .

Python Regular Expression returning nothing to repeat?

Question

1 answers

solution1
4 ACCPTED 2014-04-22 04:20:54

Python Regular Expression returning nothing to repeat?

Question

1 answers

solution1 4 ACCPTED 2014-04-22 04:20:54

solution1
4 ACCPTED 2014-04-22 04:20:54