[英]How to capture 11 digit preceded by two or three or four characters , a char could be a dot using Regex
I need to search in a HTML page a sequence of digits that could be like that: 我需要在HTML页面中搜索可能是这样的数字序列:
p.fg 67389109321
or pfg 67389109321
or pf 67389109321
p.fg 67389109321
或pfg 67389109321
或pf 67389109321
After parsing the HTML page I transform it in a string: 解析HTML页面后,我将其转换为字符串:
String Pagestring = Page.toString().toLowerCase().replaceAll(" <[^>]+>","");
and using this Regex to capture the 11 digits: 并使用此正则表达式捕获11位数字:
final Matcher m = Pattern.compile(("(?<!\\d)\\d{11}(?!\\d)")).matcher(Page );
But it captures the first instance of 11 digits. 但是它将捕获11位数字的第一个实例。 I need to include the above options. 我需要包括以上选项。
Straight forward: define the possible beginnings and separate them by "or" ( |
), then go for the 11 digits: 直截了当:定义可能的开始,并用“或”( |
)分隔它们,然后输入11位数字:
(p\.fg|pfg|p\.f) \d{11}
This means: 这意味着:
(
: delimiter for the or operation (
:或运算符的分隔符 p\\.fg
: literal pf.g
p\\.fg
:文字pf.g
|
: or : 要么 pfg
: literal pfg
pfg
:文字pfg
|
: or : 要么 p\\.f
: literal pf
p\\.f
:文字pf
)
: delimiter for the or operation )
:或运算符的分隔符
: literal space :文字空间 \\d{11}
: eleven digits \\d{11}
:11位数字 That said, removing HTML tags the way you do ( replaceAll(" <[^>]+>","");
) is not reliable. 也就是说,以您的方式删除HTML标记( replaceAll(" <[^>]+>","");
)并不可靠。 Use a HTML specific tool like HtmlAgilityPack . 使用HTML专用工具,例如HtmlAgilityPack 。 That regex might fail on HTML like 该正则表达式可能在HTML之类的HTML上失败
<tag attribute=">"/>
Regex : p(?:\\.?fg|\\.f)\\s\\d{11}
正则表达式 : p(?:\\.?fg|\\.f)\\s\\d{11}
Details: 细节:
(?:)
Non-capturing group (?:)
非捕获组 |
Or 要么 \\s
Matches any whitespace character \\s
匹配任何空格字符 Java code : Java代码 :
String string = "p.fg 67389109321 or pfg 67389109321 or p.f 67389109321";
Matcher matches = Pattern.compile("p(?:\\.?fg|\\.f)\\s\\d{11}").matcher(string);
while (matches.find()) {
System.out.println(matches.group(0));
}
Output: 输出:
p.fg 67389109321
pfg 67389109321
p.f 67389109321
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.