Scraping Clean Scientific Names with No Parenthesis Using regex

Question

I'm scraping scientific names from a website using regex, and I can't figure out how to not pull the parenthesis with the scientific name.

The HTML is written like this:

<span class="SciName">(Acanthastrea bowerbanki)</span>

My regex is written like this:

regex = '<span class="SciName">(.+?)</span>'

My results look like this:

(Acanthastrea bowerbanki)

But I need them to look like this:

Acanthastrea bowerbanki

Answer 1

You need an extra pair of parentheses, which you must escape with backslashes to make them literal characters:

regex = r'<span class="SciName">\((.+?)\)</span>'

You will use this as in:

import re

text = '<span class="SciName">(Acanthastrea bowerbanki)</span>'
regex = r'<span class="SciName">\((.+?)\)</span>'
m = re.match(regex, text)
print m.group(1)

Answer 2

You don't need to use regex for this.

s = 'blah blah blah (Acanthastrea bowerbanki) blah blah blah'

scientistName = s[s.find("(")+1:s.find(")")]

Scraping Clean Scientific Names with No Parenthesis Using regex

Question

2 answers

solution1
3 ACCPTED 2013-10-31 21:22:53

solution2
0 2013-10-31 21:25:05

Scraping Clean Scientific Names with No Parenthesis Using regex

Question

2 answers

solution1 3 ACCPTED 2013-10-31 21:22:53

solution2 0 2013-10-31 21:25:05

solution1
3 ACCPTED 2013-10-31 21:22:53

solution2
0 2013-10-31 21:25:05