简体   繁体   English

如何捕获前两位或三或四个字符的11位数字,使用Regex字符可以是点

[英]How to capture 11 digit preceded by two or three or four characters , a char could be a dot using Regex

I need to search in a HTML page a sequence of digits that could be like that: 我需要在HTML页面中搜索可能是这样的数字序列:

p.fg 67389109321 or pfg 67389109321 or pf 67389109321 p.fg 67389109321pfg 67389109321pf 67389109321

After parsing the HTML page I transform it in a string: 解析HTML页面后,我将其转换为字符串:

 String  Pagestring  = Page.toString().toLowerCase().replaceAll("  <[^>]+>","");

and using this Regex to capture the 11 digits: 并使用此正则表达式捕获11位数字:

final Matcher m = Pattern.compile(("(?<!\\d)\\d{11}(?!\\d)")).matcher(Page );

But it captures the first instance of 11 digits. 但是它将捕获11位数字的第一个实例。 I need to include the above options. 我需要包括以上选项。

Straight forward: define the possible beginnings and separate them by "or" ( | ), then go for the 11 digits: 直截了当:定义可能的开始,并用“或”( | )分隔它们,然后输入11位数字:

(p\.fg|pfg|p\.f) \d{11}

This means: 这意味着:

  • ( : delimiter for the or operation ( :或运算符的分隔符
  • p\\.fg : literal pf.g p\\.fg :文字pf.g
  • | : or : 要么
  • pfg : literal pfg pfg :文字pfg
  • | : or : 要么
  • p\\.f : literal pf p\\.f :文字pf
  • ) : delimiter for the or operation ) :或运算符的分隔符
  • : literal space :文字空间
  • \\d{11} : eleven digits \\d{11} :11位数字

Try it online 在线尝试

That said, removing HTML tags the way you do ( replaceAll(" <[^>]+>",""); ) is not reliable. 也就是说,以您的方式删除HTML标记( replaceAll(" <[^>]+>",""); )并不可靠。 Use a HTML specific tool like HtmlAgilityPack . 使用HTML专用工具,例如HtmlAgilityPack That regex might fail on HTML like 该正则表达式可能在HTML之类的HTML上失败

<tag attribute=">"/>

Regex : p(?:\\.?fg|\\.f)\\s\\d{11} 正则表达式p(?:\\.?fg|\\.f)\\s\\d{11}

Details: 细节:

  • (?:) Non-capturing group (?:)非捕获组
  • | Or 要么
  • \\s Matches any whitespace character \\s匹配任何空格字符

Java code : Java代码

String string = "p.fg 67389109321 or  pfg 67389109321 or  p.f 67389109321";
Matcher matches = Pattern.compile("p(?:\\.?fg|\\.f)\\s\\d{11}").matcher(string);
while (matches.find()) {
    System.out.println(matches.group(0));
}

Output: 输出:

p.fg 67389109321
pfg 67389109321
p.f 67389109321

Code demo 代码演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式匹配例外情况下的[任何字符] [任何数字]点 - Regex to match [any char][any digit]dot with an exception case 如何为子弹空间数字和点写正则表达式 - How to write regex for bullet space digit and dot 如何使用Java正则表达式语法(不在Java代码中)查找一个单词,且该单词之前不包含100个字符的另一个单词 - How to find a word using java regex syntax (not in java code) that is not preceded with another word with in 100 characters 正则表达式匹配一个数字后跟一个点(“。”) - Regex to match a digit not followed by a dot(“.”) 如何使用字符串的 replaceAll 在前面带有某些字符时不替换 - How to not replace when preceded with some characters using String's replaceAll 正则表达式匹配两个连续的字符,除非后面有多个相同的字符 - Regex to match two consecutive characters unless followed/preceded by more of the same character Java&Regex:匹配不在特定字符前面的子字符串 - Java & Regex: Matching a substring that is not preceded by specific characters 如何用点或冒号分隔每个空格? - How to split by every space preceded by a dot or colon? 使用正则表达式捕获/包含并在之后选择字符 - using regex to capture / inclusive and select characters after Java regex:匹配一个字符,除非前面有另一个字符 - Java regex : matching a char except when preceded by another char
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM