How do I find whole words using regular expressions in Python? I use Beautiful soup and re library to parse a document. In soup I need to find all contents after word 'E-mail'. I try
for sublink in link.findAll(text = re.compile("[E-mail:0-9a-zA-Z]")):
print sublink.encode('utf-8')
But it does not work.
Here is a working example for word extraction via regular expressions:
import re
text = "First line\n" + \
"Second line\n" + \
"Important line! E-mail:mail@domain.de, Phone:991\n" + \
"Another important line! E-mail:tom@gmail.com, Phone:001\n" + \
"Another line"
print text
emails = re.findall("E-mail:([\w@.-]+)", text)
print "Found email(s): " + ', '.join(emails)
Output:
Found email(s): mail@domain.de, tom@gmail.com
Not sure if that's what you are looking for.
Edit: The characters 0-9a-zA-Z
can be written as \\w
. And yes, I added .
and -
. Simply put them into [\\w@.-]
if there are more possible characters.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.