I have the following task:
Use grep with the -Pao options and a regular expression to extract all phone numbers from the broken file (solution: 13 phone numbers). The regular expression should match as closely as possible the following formats of phone numbers and be as short as possible:
I tried to work with the respective beginning of the numbers, to then put them together and keep moving forward.
I now have the following code:
grep -Pao '(\+\d{2}.) | (\d{3,4}) | (\d\s\d{2})' kaputt.txt
(the mode is PCRE)
Unfortunately, the code does not return the desired results, as it seems that search conditions are mutually exclusive. I would therefore be grateful for help here.
Are there blanks on both sides of the pipes? If yes, the first case actually is (+\d{2}.)\s which doesn't match any of the formats.
It would be a fool's errand to try and find the absolute shortest regex possible. The following should be fine as no format seems to be an extension of another.
grep -Pao "(?:\+\d\d \d\d \d{7}|\+\d\d (\d\d) \d{5} \- \d\d|\+\d\d (\d)\d\d \d{5}\-\d\d|\+\d\d-\d\d\-\d{7}|\+\d\d \d\d \d{5}\-\d\d|\d{4} \d \d{6}|\d \d\d \/ \d\d \d\d \d\d|\d{8}\-\d\d)" kaputt.txt
It is just the text extracted from your image (,) of the required formats, with x
replaced by \d
, -
replaced by \-
, +
replaced by \+
, and with each format alternative separated by |
.
If you want to match across lines then the -z
flag is required and each space could be replaced with, for example, \s+
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.