简体   繁体   中英

Regex to find certain phone numbers in a damaged file

I have the following task:

Use grep with the -Pao options and a regular expression to extract all phone numbers from the broken file (solution: 13 phone numbers). The regular expression should match as closely as possible the following formats of phone numbers and be as short as possible:

在此处输入图像描述

I tried to work with the respective beginning of the numbers, to then put them together and keep moving forward.

I now have the following code:

grep -Pao '(\+\d{2}.) | (\d{3,4}) | (\d\s\d{2})' kaputt.txt

(the mode is PCRE)

Unfortunately, the code does not return the desired results, as it seems that search conditions are mutually exclusive. I would therefore be grateful for help here.

Are there blanks on both sides of the pipes? If yes, the first case actually is (+\d{2}.)\s which doesn't match any of the formats.

It would be a fool's errand to try and find the absolute shortest regex possible. The following should be fine as no format seems to be an extension of another.

grep -Pao "(?:\+\d\d \d\d \d{7}|\+\d\d (\d\d) \d{5} \- \d\d|\+\d\d (\d)\d\d \d{5}\-\d\d|\+\d\d-\d\d\-\d{7}|\+\d\d \d\d \d{5}\-\d\d|\d{4} \d \d{6}|\d \d\d \/ \d\d \d\d \d\d|\d{8}\-\d\d)" kaputt.txt

It is just the text extracted from your image (,) of the required formats, with x replaced by \d , - replaced by \- , + replaced by \+ , and with each format alternative separated by |.

If you want to match across lines then the -z flag is required and each space could be replaced with, for example, \s+ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM