简体   繁体   中英

Phone Number Regular Expression (Regex) in Python

Dive into python gives an amazing little tutorial on creating a regular expression for phone numbers: http://diveintopython3.ep.io/regular-expressions.html#phonenumbers

The final version comes out to look like:

phone_re = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$', re.VERBOSE)

This works fine for almost all examples I can come up with, however I found a pretty big failure that I can't seem to fix.

If a group of 3 digits comes before the phone number it works fine. IE: "500 dollars off, call 123-456-7891"

If a group of 3 digits comes after the phone number it fails. IE: "Call 123-456-7891 for a discount of up to 500"

Any ideas on a fix that would work for both examples?

The (\\d*)$ requires that the string you're matching against end with digit characters (the $ signifies "end of line"). Try removing the $ if you're matching against a larger string where the phone number may not be at the end of the line.

Here's your original, with some spaces (use re.VERBOSE , or remove the spaces):

(\d{3}) \D* (\d{3}) \D* (\d{4}) \D* (\d*)

The \\D* will match anything that's not a digit, including words. Maybe you should try this:

(\d{3}) \W* (\d{3}) \W* (\d{4}) \W* (\d*)

The \\W* matches anything that's not a word. It will match (222) - 222 - 2222 . However, it will not match if there is a letter between the numbers, as in (222) x 222 - 2222 . The last part of the match (\\d*) appears to be looking for an extension. These can be formatted in a variety of ways—I suggest you either drop it or refine it based on how you expect your data to look. And, like Amber says, you should probably drop the $ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM