Given the following code:
import re
file_object = open("all-OANC.txt", "r")
file_text = file_object.read()
pattern = "(\+?1-)?(\()?[0-9]{3}(\))?(-|.)[0-9]{3}(-|.)[0-9]{4}"
for match in re.findall(pattern, file_text):
print match
I get output that stretches like this:
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
('', '', '', '-', '-')
I'm trying to find phone numbers, and I am one hundred percent sure there are numbers in the file. When I search for numbers in an online applet for example, with the same expression, I get matches.
Here is a snippet where the expression is found outside of python:
"Slate on Paper," our specially formatted print-out version of Slate, is e-mailed to readers Friday around midday. It also can be downloaded from our site. Those services are free. An actual paper edition of "Slate on Paper" can be mailed to you (call 800-555-4995), but that costs money and can take a few days to arrive."
I want output that at least recognizes the presence of a number
It's your capture groups that are being displayed. Display the whole match:
text = '''"Slate on Paper," our specially formatted print-out version of Slate, is e-mailed to readers Friday around midday. It also can be downloaded from our site. Those services are free. An actual paper edition of "Slate on Paper" can be mailed to you (call 800-555-4995), but that costs money and can take a few days to arrive."'''
pattern = "(\+?1-)?(\()?[0-9]{3}(\))?(-|.)[0-9]{3}(-|.)[0-9]{4}"
for match in re.finditer(pattern,text):
print(match.group())
Output:
800-555-4995
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.