jsoup可以处理元刷新重定向吗

Question

I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to explain clearly if I am entering a website url named http://www.amerisourcebergendrug.com which will automatically redirect to http://www.amerisourcebergendrug.com/abcdrug/ depending upon the meta refresh url but my jsoup is still sticking with http://www.amerisourcebergendrug.com and not redirecting and fetching from http://www.amerisourcebergendrug.com/abcdrug/我在使用 jsoup 时遇到问题，我试图做的是从 url 中获取一个文档，该文档将重定向到另一个基于元刷新 url 的 url，该 url 不起作用，如果我输入一个名为http://的网站 url，请清楚地解释www.amerisourcebergendrug.com将根据元刷新 url 自动重定向到http://www.amerisourcebergendrug.com/abcdrug/但我的 jsoup 仍然坚持使用http://www.amerisourcebergendrug.com而不是重定向和从http://www.amerisourcebergendrug.com/abcdrug/

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();

I have also tried using,我也试过使用，

Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();

but both are not working但两者都不起作用

Any workaround for this?有什么解决方法吗？

Update: The Page may use meta refresh redirect methods更新：页面可能使用元刷新重定向方法

Answer 1

Update (case insensitive and pretty fault tolerant)更新（不区分大小写和相当容错）

The content parsed (almost) according to spec内容解析（几乎）根据规范
The first successfully parsed content meta data should be used应该使用第一个成功解析的内容元数据

public static void main(String[] args) throws Exception {

    URI uri = URI.create("http://www.amerisourcebergendrug.com");

    Document d = Jsoup.connect(uri.toString()).get();

    for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {

        Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
                           .matcher(refresh.attr("content"));

        // find the first one that is valid
        if (m.matches()) {
            if (m.group(1) != null)
                d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
            break;
        }
    }
}

Outputs correctly:正确输出：

http://www.amerisourcebergendrug.com/abcdrug/

Old answer:旧答案：

Are you sure that it isn't working.你确定它不起作用。 For me:为了我：

System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());

.. outputs http://www.ibm.com/us/en/ correctly.. .. 正确输出http://www.ibm.com/us/en/ ..

Answer 2

to have a better error handling and case sensitivity problem有更好的错误处理和区分大小写的问题

try
{
    Document doc = Jsoup.connect("http://www.ibm.com").get();
    Elements meta = doc.select("html head meta");
    if (meta != null)
    {
        String lvHttpEquiv = meta.attr("http-equiv");
        if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh"))
        {
            String lvContent = meta.attr("content");
            if (lvContent != null)
            {
                String[] lvContentArray = lvContent.split("=");
                if (lvContentArray.length > 1)
                    doc = Jsoup.connect(lvContentArray[1]).get();
            }
        }
    }

    // get page title
    return doc.title();

}
catch (IOException e)
{
    e.printStackTrace();
}

jsoup可以处理元刷新重定向吗

问题描述

2 个解决方案

解决方案1
12 已采纳 2011-09-08 11:47:49

Update (case insensitive and pretty fault tolerant)更新（不区分大小写和相当容错）

Old answer:旧答案：

解决方案2
2 2014-08-18 02:42:31

jsoup可以处理元刷新重定向吗

问题描述

2 个解决方案

解决方案1 12 已采纳 2011-09-08 11:47:49

Update (case insensitive and pretty fault tolerant)更新（不区分大小写和相当容错）

Old answer:旧答案：

解决方案2 2 2014-08-18 02:42:31

解决方案1
12 已采纳 2011-09-08 11:47:49

解决方案2
2 2014-08-18 02:42:31