Input (comma separated list):
"\"Mr ABC\" <mr@abc.com>, \"Foo, Bar\" <foo@bar.com>, mr@xyz.com"
Expected output (list of 2-tuples):
[("Mr ABC", "mr@abc.com"), ("Foo, Bar", "foo@bar.com"), ("", "mr@xyz.com")]
I could actually use comma splitting and then use email.utils.parseaddr(address)
until I realized that the name part can also have comma in it, like in "Foo, Bar" above.
email.utils.getaddresses(fieldvalues)
is very close to what I need but it accepts a sequence, not a comma separated string.
You may use the following
import re
p = re.compile(r'"([^"]+)"(?:\s+<([^<>]+)>)?')
test_str = '"Mr ABC" <mr@abc.com>, "Foo, Bar" <foo@bar.com>, "mr@xyz.com"'
print(re.findall(p, test_str))
Output: [('Mr ABC', 'mr@abc.com'), ('Foo, Bar', 'foo@bar.com'), ('mr@xyz.com', '')]
See IDEONE demo
The regex matches...
"
- a double quote ([^"]+)
- (Group 1) 1 or more characters other than a double quote "
- a double quote Then, an optional non-capturing group is introduced with (?:...)?
construct: (?:\\s+<([^<>]+)>)?
. It matches...
\\s+
- 1 or more whitespace characters <
- an opening angle bracket ([^<>]+)
- (Group 2) 1 or more characters other than opening or closing angle brackets >
- a closing angle bracket The re.findall
function gets all capture groups into a list of tuples:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
UPDATE :
In case you need to make sure the email is the second element in the tuple, use this code (see demo ):
lst = re.findall(p, test_str)
print([(tpl[1], tpl[0]) if not tpl[1] else tpl for tpl in lst])
# => [('Mr ABC', 'mr@abc.com'), ('Foo, Bar', 'foo@bar.com'), ('', 'mr@xyz.com')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.