简体   繁体   English

Java模式正则表达式可在标签之间提取

[英]Java Pattern regex to extract between tags

I am trying to design my custom XML Reader for RSS Feeds. 我正在尝试设计用于RSS供稿的自定义XML阅读器。 Below is my JAVA Code in testing: 以下是我在测试中的JAVA代码:

Pattern pattern = Pattern.compile("<(item)(.*?)>((.*))</\\1>", Pattern.CASE_INSENSITIVE);

Matcher matcher = pattern.matcher("<item value=\"key\" atr='none'><title val=\"has value\">Good</title><link>www</link></item>"
+ "<item value=\"key\" atr='none'><title val=\"has value\">Bad</title><link>http</link></item>"
+ "<item value=\"key\" atr='none'><title val=\"has value\">Neutral</title><link>ftp</link></item>");

while (matcher.find()) {
for (int i = 0; i < matcher.groupCount(); i++) {
        System.out.println("\n" + i + ":" + matcher.group(i));
}}

Here is the output: 这是输出:

0:<item value="key" atr='none'><title val="has value">Good</title><link>www</link></item><item value="key" atr='none'><title val="has value">Bad</title><link>http</link></item><item value="key" atr='none'><title val="has value">Neutral</title><link>ftp</link></item>

1:item

2: value="key" atr='none'

3:<title val="has value">Good</title><link>www</link></item><item value="key" atr='none'><title val="has value">Bad</title><link>http</link></item><item value="key" atr='none'><title val="has value">Neutral</title><link>ftp</link>

Desired Output: 所需输出:

<title val="has value">Good</title><link>www</link>
<title val="has value">Bad</title><link>http</link>
<title val="has value">Neutral</title><link>ftp</link>

Basically I want loop should run as much time as much the number of item tag is present in Source String. 基本上,我希望循环运行的时间应与Source String中出现的item标签数量一样多。 Currently the 3rd group in regex is extracting the String till the last end tag matching 1st group, which should not be the case. 当前,正则表达式中的第3组正在提取String,直到匹配第1组的最后一个结束标记为止,事实并非如此。 3rd group should contain the string till the matching the respective end tag of 1st group. 第三组应包含字符串,直到匹配第一组各自的结束标签为止。

EDIT: On the recommendation of @11thdimension, I am adding some more information what I need: 编辑:根据@ 11thdimension的建议,我要添加一些我需要的更多信息:

  1. XML Structure can contain other tags also in ITEM tag, like date, author, etc. The code should retrieve those tags also with title & link tags. XML Structure还可在ITEM标签中包含其他标签,例如日期,作者等。代码也应使用title&link标签检索这些标签。
  2. Hierarchy of tags is not fixed. 标签的层次结构不固定。 They can be any order: title, link, date or link, title, date or date, link, title etc. 它们可以是任何顺序:标题,链接,日期或链接,标题,日期或日期,链接,标题等。

You should use an XML parser as suggested by Lucero. 您应该使用Lucero建议的XML解析器。

However if you must use RegEx then you can use following. 但是,如果必须使用RegEx,则可以使用以下命令。

<title.*?<\/link>

Working regex101 link https://regex101.com/r/EWG2Io/2 正常的regex101链接https://regex101.com/r/EWG2Io/2

Edit 编辑

For the special case where you need everything inside <item></item> use following 对于需要在<item></item>所有内容的特殊情况,请使用以下命令

<item.*?>(.*?)<\/item>

Working example https://regex101.com/r/Ow1A5F/1 工作示例https://regex101.com/r/Ow1A5F/1

Also here's Java sample 这也是Java示例

public class TestRegex {
    public static void main(String[] args) {
        String str = "<item value=\"key\" atr='none'><date><date><title val=\"has value\">Good</title><link>www</link></item><item value=\"key\" atr='none'><title val=\"has value\">Bad</title><link>http</link><author></author></item><item value=\"key\" atr='none'><title val=\"has value\">Neutral</title><link>ftp</link></item>";

        Pattern pattern = Pattern.compile("<item.*?>(.*?)<\\/item>");

        Matcher match = pattern.matcher(str);

        while(match.find()) {
            System.out.println(match.group(1));
        }
    }
}

Output 产量

<date><date><title val="has value">Good</title><link>www</link>
<title val="has value">Bad</title><link>http</link><author></author>
<title val="has value">Neutral</title><link>ftp</link>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM