[英]Java Regular Expression Extract String Between Two Words
我有一個看起來像這樣的字符串
<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/>
我想提取“遠程操作系統:”並獲取“Microsoft Windows Server 2008 R2 Enterprise Service Pack 1”。
Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>
所以我使用了正則表達式
Pattern pattern = Pattern.compile("(?<=\\bRemote operating system :\\b).*?(?=\\b<br/>\\b)");
但我的正則表達似乎沒有奏效。 任何想法? 這也是提取這個操作系統字符串的好方法,或者我應該采取另一種方式嗎? 謝謝!
試試這種模式: ".*Remote operating system : (.*?)<br/>"
public static void main(String[] args) throws Exception {
String s = "<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/>";
Pattern pattern = Pattern.compile(".*Remote operating system : (.*?)<br/>");
Matcher m = pattern.matcher(s);
if (m.find()) {
System.out.println(m.group(1));
}
else System.out.println("Not found");
}
有后面沒有空格:
和之前\\\\b
在您的正則表達式。
試試這種方式:
Pattern.compile("(?<=\\bRemote operating system : \\b).*?(?=\\b<br/>\\b)");
// ^additional space
沒有那個空格\\\\b
將不匹配新單詞的開頭(微軟)(它也永遠不會匹配單詞的結尾,因為:
不能正確的單詞結束)。
String test =
"<br/><description>Using a combination of remote probes, " +
"(TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess " +
"the name of the remote operating system in use, and sometimes " +
"its version.</description><br/><fname>os_fingerprint.nasl</fname>" +
"<br/><plugin_modification_date>2012/12/01</plugin_modification_date>" +
"<br/><plugin_name>OS Identification</plugin_name><br/>" +
"<plugin_publication_date>2003/12/09</plugin_publication_date><br/>" +
"<plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor>" +
"<br/><solution>n/a</solution><br/><synopsis>It is possible to guess the " +
"remote operating system.</synopsis><br/><plugin_output><br/>Remote operating " +
"system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>" +
"Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is " +
"running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1" +
"</plugin_output><br/>";
Pattern pattern = Pattern.compile("Remote\\soperating\\ssystem\\s:\\s(.+?)\\<br/>");
Matcher matcher = pattern.matcher(test);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
輸出:
Microsoft Windows Server 2008 R2 Enterprise Service Pack 1
請注意,通常不建議使用正則表達式來對抗標記語言。 但是在這里你使用正則表達式來對付特定的文本字符串,這恰好只是在標記內部,所以我猜它沒關系。
嘗試下一個:
if (str.matches("^.*Remote operating system : ([^<]*).*$")) {
System.out.println(
str.replaceAll("^.*Remote operating system : ([^<]*).*$", "$1")
);
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.