Regex to find certain phone numbers in a damaged file

Question

I have the following task:

Use grep with the -Pao options and a regular expression to extract all phone numbers from the broken file (solution: 13 phone numbers). The regular expression should match as closely as possible the following formats of phone numbers and be as short as possible:

I tried to work with the respective beginning of the numbers, to then put them together and keep moving forward.

I now have the following code:

grep -Pao '(\+\d{2}.) | (\d{3,4}) | (\d\s\d{2})' kaputt.txt

(the mode is PCRE)

Unfortunately, the code does not return the desired results, as it seems that search conditions are mutually exclusive. I would therefore be grateful for help here.

Answer 1

Are there blanks on both sides of the pipes? If yes, the first case actually is (+\d{2}.)\s which doesn't match any of the formats.

Answer 2

It would be a fool's errand to try and find the absolute shortest regex possible. The following should be fine as no format seems to be an extension of another.

grep -Pao "(?:\+\d\d \d\d \d{7}|\+\d\d (\d\d) \d{5} \- \d\d|\+\d\d (\d)\d\d \d{5}\-\d\d|\+\d\d-\d\d\-\d{7}|\+\d\d \d\d \d{5}\-\d\d|\d{4} \d \d{6}|\d \d\d \/ \d\d \d\d \d\d|\d{8}\-\d\d)" kaputt.txt

It is just the text extracted from your image (,) of the required formats, with x replaced by \d , - replaced by \- , + replaced by \+ , and with each format alternative separated by |.

If you want to match across lines then the -z flag is required and each space could be replaced with, for example, \s+ .

Regex to find certain phone numbers in a damaged file

Question

2 answers

solution1
0 2022-01-02 12:31:58

solution2
0 2022-01-02 16:27:42

Regex to find certain phone numbers in a damaged file

Question

2 answers

solution1 0 2022-01-02 12:31:58

solution2 0 2022-01-02 16:27:42

solution1
0 2022-01-02 12:31:58

solution2
0 2022-01-02 16:27:42