简体   繁体   English

正则表达式在损坏的文件中查找某些电话号码

[英]Regex to find certain phone numbers in a damaged file

I have the following task:我有以下任务:

Use grep with the -Pao options and a regular expression to extract all phone numbers from the broken file (solution: 13 phone numbers).使用带有 -Pao 选项和正则表达式的 grep 从损坏的文件中提取所有电话号码(解决方案:13 个电话号码)。 The regular expression should match as closely as possible the following formats of phone numbers and be as short as possible:正则表达式应尽可能匹配以下电话号码格式并尽可能短:

在此处输入图像描述

I tried to work with the respective beginning of the numbers, to then put them together and keep moving forward.我尝试使用相应的数字开头,然后将它们放在一起并继续前进。

I now have the following code:我现在有以下代码:

grep -Pao '(\+\d{2}.) | (\d{3,4}) | (\d\s\d{2})' kaputt.txt

(the mode is PCRE) (模式为PCRE)

Unfortunately, the code does not return the desired results, as it seems that search conditions are mutually exclusive.不幸的是,代码没有返回所需的结果,因为搜索条件似乎是互斥的。 I would therefore be grateful for help here.因此,我将不胜感激这里的帮助。

Are there blanks on both sides of the pipes?管道两侧是否有空白? If yes, the first case actually is (+\d{2}.)\s which doesn't match any of the formats.如果是,则第一种情况实际上是 (+\d{2}.)\s ,它与任何格式都不匹配。

It would be a fool's errand to try and find the absolute shortest regex possible.尝试找到可能的绝对最短的正则表达式将是愚蠢的差事。 The following should be fine as no format seems to be an extension of another.以下应该没问题,因为没有格式似乎是另一种格式的扩展。

grep -Pao "(?:\+\d\d \d\d \d{7}|\+\d\d (\d\d) \d{5} \- \d\d|\+\d\d (\d)\d\d \d{5}\-\d\d|\+\d\d-\d\d\-\d{7}|\+\d\d \d\d \d{5}\-\d\d|\d{4} \d \d{6}|\d \d\d \/ \d\d \d\d \d\d|\d{8}\-\d\d)" kaputt.txt

It is just the text extracted from your image (,) of the required formats, with x replaced by \d , - replaced by \- , + replaced by \+ , and with each format alternative separated by |它只是从所需格式的图像 (,) 中提取的文本,其中x替换为\d-替换为\-+替换为\+ ,并且每种格式替换由|分隔. .

If you want to match across lines then the -z flag is required and each space could be replaced with, for example, \s+ .如果要跨行匹配,则需要-z标志,并且每个空格都可以替换为,例如\s+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM