简体   繁体   English

Java正则表达式在href标记之间提取数据

[英]Java regex extract data between a href tags

I am trying to extract data between a href tags in a Java string. 我试图在Java字符串中的href标记之间提取数据。 I can acheive this with replace all and substring and with using indexOf etc. 我可以通过替换all和substring以及使用indexOf等来实现这一点。

I would like to know how can I get data using regex. 我想知道如何使用正则表达式获取数据。

So basically i am trying to extract data and store in a string or in a list. 所以基本上我试图提取数据并存储在字符串或列表中。

String data ="12345";
        String sampleStr ="";
        for(int i=0; i<10; i++) {
         data+=i;
        sampleStr += "<a href=\"javascript:yyy_getDetail(\'"+data+"\')\">"+data+"</a>"+", ";
        }           

        System.out.println(sampleStr);
        String temp = sampleStr.substring(sampleStr.indexOf("\">")+2);

Any suggestion in regard will be appreciated. 任何有关的建议将不胜感激。 What should be regex, so i only extract data. 什么应该是正则表达式,所以我只提取数据。

Here is an example for your needs. 以下是您需求的示例。 Note, that the full match will contain the string with anchor tags and your searched content is in the group 1 . 请注意,完整匹配将包含带有锚标记的字符串,并且您搜索的内容位于group 1

String data ="12345";
String sampleStr ="";
for(int i=0; i<10; i++) 
{
 data+=i;
 sampleStr += "<a href=\"javascript:yyy_getDetail(\'"+data+"\')\">"+data+"</a>"+", ";
} 

Pattern pattern = Pattern.compile("<a[^>]*>(.*?)</a>");
Matcher matcher = pattern.matcher(sampleStr );
while (matcher.find()) 
{
        System.out.println("Result "+ matcher.group(1));
}

Please, use a HTML/XML parser instead. 请改用HTML / XML解析器。 Your life would be much easier. 你的生活会轻松得多。

HTML is usually very inconsistent and you can't be sure that it will turn out the way you want it. HTML通常非常不一致,你不能确定它会以你想要的方式结束。

There's actually a famous answer regarding this, at RegEx match open tags except XHTML self-contained tags 实际上有一个着名的答案,在RegEx匹配开放标签,除了XHTML自包含标签

You should take a look at Best XML parser for Java for your options if you choose to use a HTML/XML parser :) 如果您选择使用HTML / XML解析器,您应该查看适用于Java的Best XML解析器:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM