从html行中提取href的正确正则表达式是什么？（JAVA）

Question

Hi I am trying to extract text which a href defines in a html line. 嗨，我正在尝试提取href在html行中定义的文本。 For example: 例如：

<link rel="stylesheet" href="style.css" type="text/css">

I want to get "style.css" or: 我想得到“style.css”或：

<a href="target0.html"><img align="center" src="thumbnails/image001.jpg" width="154" height="99">

I want to get "target0.html" 我想得到“target0.html”

What would be the correct Java code to do this? 执行此操作的正确Java代码是什么？

Answer 1

    public static String getHref(String str)
    {
        int startIndex = str.indexOf("href=");
        if (startIndex < 0)
            return "";
        return str.substring(startIndex + 6, str.indexOf("\"", startIndex + 6));
    }

This method assumes that the html is well formed and it only works for the first href in the string but I'm sure you can extrapolate from here. 这个方法假设html格式正确，它只适用于字符串中的第一个href，但我相信你可以从这里推断出来。

Answer 2

I realize you asked about using regular expressions, but jsoup makes this so simple and is much less error prone: 我意识到你问过使用正则表达式，但是jsoup使得它变得如此简单并且更不容易出错：

import java.io.IOException;

import nu.xom.ParsingException;
import nu.xom.ValidityException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.xml.sax.SAXException;

public class HrefExtractor {
    public static void main(final String[] args) throws SAXException, ValidityException, ParsingException, IOException {
        final Document document = Jsoup.parse("<a href=\"target0.html\"><img align=\"center\" src=\"thumbnails/image001.jpg\" width=\"154\" height=\"99\">");
        final Elements links = document.select("a[href]");
        for (final Element element : links) {
            System.out.println(element.attr("href"));
        }
    }
}

Answer 3

I have not try the following but it should be something like this: 我没有尝试以下但它应该是这样的：

'Pattern.compile("<(?:link|a\\s+)[^>] href=\\"(. ?)\\"")' 'Pattern.compile（“<（？：link | a \\ s +）[^>] href = \\”（。 ？）\\“”）'

But I'd recommend you to use one of available HTML or even XML parsers for this task. 但我建议你使用一个可用的HTML甚至XML解析器来完成这项任务。

从html行中提取href的正确正则表达式是什么？（JAVA）

问题描述

3 个解决方案

解决方案1
1 已采纳 2011-11-21 18:36:07

解决方案2
1 2011-11-21 18:45:16

解决方案3
0 2011-11-21 18:32:04

从html行中提取href的正确正则表达式是什么？ （JAVA）

问题描述

3 个解决方案

解决方案1 1 已采纳 2011-11-21 18:36:07

解决方案2 1 2011-11-21 18:45:16

解决方案3 0 2011-11-21 18:32:04

从html行中提取href的正确正则表达式是什么？（JAVA）

解决方案1
1 已采纳 2011-11-21 18:36:07

解决方案2
1 2011-11-21 18:45:16

解决方案3
0 2011-11-21 18:32:04