在带有未知标记名的html标记之间提取？

Question

<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>....

I want to extract everything that comes after Topic1 and the next  starting tag. 我想提取Topic1和下一个起始标记之后的所有内容。 Which in this case would be: <ul>asdasd</ul>  . 在这种情况下为： <ul>asdasd</ul>  。

Problem: it must not necessairly be the  tag, but could be any other repeating tag. 问题：不必一定是标记，而可以是任何其他重复标记。

So my question is: how can I dynamically extract those text? 所以我的问题是：如何动态提取这些文本？ The only static thinks are: 唯一静态的想法是：

The signal keyword to look for is always "Topic1". 要查找的信号关键字始终为“ Topic1”。 I'd like to take the surrounding tags as the one to look for. 我想将周围的标签作为要查找的标签。
The tag is always repeated. 标签总是重复的。 In this case it's always  , it might as well be  or  or <h1> etc. 在这种情况下，它始终是 ，也可能是或或<h1>等。

I know how to write the java code, but what would the regex be like? 我知道如何编写Java代码，但是正则表达式会是什么样子？

String regex = ">Topic1<";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
    for (int i = 1; i <= m.groupCount(); i++) {
        System.out.println(m.group(i));
    }
}

Answer 1

The following should work 以下应该工作

Topic1</(.+?)>(.*?)<\\1>

Input: Topic1<ul>asdasd</ul> Topic2<ul> 输入： Topic1<ul>asdasd</ul> Topic2<ul>

Output: <ul>asdasd</ul>  输出： <ul>asdasd</ul> 

Code: 码：

    Pattern p = Pattern.compile("Topic1</(.+?)>(.*?)<\\1>");
    //  get a matcher object
    Matcher m = p.matcher("<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>");
    while(m.find()) {
        System.out.println(m.group(2));  // <ul>asdasd</ul><br/>
    }

Answer 2

Try this 尝试这个

String pattern = "\\<.*?\\>Topic1\\<.*?\\>"; // this will see the tag no matter what tag it is
String text = "<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b>"; // your string to be split
String[] attributes = text.split(pattern);
for(String atr : attributes) 
{
    System.out.println(atr);
}

Will print out: 将打印出：

<ul>asdasd</ul><br/><b>Topic2</b>

在带有未知标记名的html标记之间提取？

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-01-12 16:31:16

解决方案2
0 2016-01-12 16:31:32

在带有未知标记名的html标记之间提取？

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-01-12 16:31:16

解决方案2 0 2016-01-12 16:31:32

解决方案1
2 已采纳 2016-01-12 16:31:16

解决方案2
0 2016-01-12 16:31:32