I'm scraping scientific names from a website using regex, and I can't figure out how to not pull the parenthesis with the scientific name.
The HTML is written like this:
<span class="SciName">(Acanthastrea bowerbanki)</span>
My regex is written like this:
regex = '<span class="SciName">(.+?)</span>'
My results look like this:
(Acanthastrea bowerbanki)
But I need them to look like this:
Acanthastrea bowerbanki
You need an extra pair of parentheses, which you must escape with backslashes to make them literal characters:
regex = r'<span class="SciName">\((.+?)\)</span>'
You will use this as in:
import re
text = '<span class="SciName">(Acanthastrea bowerbanki)</span>'
regex = r'<span class="SciName">\((.+?)\)</span>'
m = re.match(regex, text)
print m.group(1)
You don't need to use regex for this.
s = 'blah blah blah (Acanthastrea bowerbanki) blah blah blah'
scientistName = s[s.find("(")+1:s.find(")")]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.