Extract emails from html using regex

Question

I'm trying to extract any jabber accounts (emails) using regex from this page .

I've tried using regex:

\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

...but it's not producing the desired results.

Answer 1

This might work:

[^\\s@<>]+@[^\\s@<>]+\\.[^\\s@<>]+

p = re.compile(ur'[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+', re.MULTILINE | re.IGNORECASE)
test_str = r'...'
re.findall(p, test_str)

See example .

Answer 2

# -*- coding: utf-8 -*-
s = '''
...YOUR HTML page source code HERE..........

'''

import re
reobj = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
print re.findall(reobj, s.decode('utf-8'))

Result

[u'skypeman@jabbim.cz', u'sonics@creep.im', u'voxis_team@lsd-25.ru', u'voxis_team@lsd-25.ru', u'adhrann@jabbim.cz', u'jabberwocky@jabber.systemli.org']

Answer 3

试试这个：

reg_emails=r'^((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))@((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))\.((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))$'

Extract emails from html using regex

Question

3 answers

solution1
4 ACCPTED 2015-03-05 21:40:29

solution2
3 2015-03-06 00:18:54

Result

solution3
0 2017-09-10 08:48:43

Extract emails from html using regex

Question

3 answers

solution1 4 ACCPTED 2015-03-05 21:40:29

solution2 3 2015-03-06 00:18:54

Result

solution3 0 2017-09-10 08:48:43

solution1
4 ACCPTED 2015-03-05 21:40:29

solution2
3 2015-03-06 00:18:54

solution3
0 2017-09-10 08:48:43