[英]Extract between html tag with unknown tagname?
<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>....
I want to extract everything that comes after <b>Topic1</b>
and the next <b>
starting tag. 我想提取
<b>Topic1</b>
和下一个<b>
起始标记之后的所有内容。 Which in this case would be: <ul>asdasd</ul><br/>
. 在这种情况下为:
<ul>asdasd</ul><br/>
。
Problem: it must not necessairly be the <b>
tag, but could be any other repeating tag. 问题:不必一定是
<b>
标记,而可以是任何其他重复标记。
So my question is: how can I dynamically extract those text? 所以我的问题是:如何动态提取这些文本? The only static thinks are:
唯一静态的想法是:
<b>
, it might as well be <i>
or <strong>
or <h1>
etc. <b>
,也可能是<i>
或<strong>
或<h1>
等。 I know how to write the java code, but what would the regex be like? 我知道如何编写Java代码,但是正则表达式会是什么样子?
String regex = ">Topic1<";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}
The following should work 以下应该工作
Topic1</(.+?)>(.*?)<\\1>
Input: <b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>
输入:
<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>
Output: <ul>asdasd</ul><br/>
输出:
<ul>asdasd</ul><br/>
Code: 码:
Pattern p = Pattern.compile("Topic1</(.+?)>(.*?)<\\1>");
// get a matcher object
Matcher m = p.matcher("<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>");
while(m.find()) {
System.out.println(m.group(2)); // <ul>asdasd</ul><br/>
}
Try this 尝试这个
String pattern = "\\<.*?\\>Topic1\\<.*?\\>"; // this will see the tag no matter what tag it is
String text = "<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b>"; // your string to be split
String[] attributes = text.split(pattern);
for(String atr : attributes)
{
System.out.println(atr);
}
Will print out: 将打印出:
<ul>asdasd</ul><br/><b>Topic2</b>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.