[英]can jsoup handle meta refresh redirect
I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to explain clearly if I am entering a website url named http://www.amerisourcebergendrug.com which will automatically redirect to http://www.amerisourcebergendrug.com/abcdrug/ depending upon the meta refresh url but my jsoup is still sticking with http://www.amerisourcebergendrug.com and not redirecting and fetching from http://www.amerisourcebergendrug.com/abcdrug/我在使用 jsoup 时遇到问题,我试图做的是从 url 中获取一个文档,该文档将重定向到另一个基于元刷新 url 的 url,该 url 不起作用,如果我输入一个名为http://的网站 url,请清楚地解释www.amerisourcebergendrug.com将根据元刷新 url 自动重定向到http://www.amerisourcebergendrug.com/abcdrug/但我的 jsoup 仍然坚持使用http://www.amerisourcebergendrug.com而不是重定向和从http://www.amerisourcebergendrug.com/abcdrug/
Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();
I have also tried using,我也试过使用,
Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();
but both are not working但两者都不起作用
Any workaround for this?有什么解决方法吗?
Update: The Page may use meta refresh redirect methods更新:页面可能使用元刷新重定向方法
public static void main(String[] args) throws Exception {
URI uri = URI.create("http://www.amerisourcebergendrug.com");
Document d = Jsoup.connect(uri.toString()).get();
for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {
Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
.matcher(refresh.attr("content"));
// find the first one that is valid
if (m.matches()) {
if (m.group(1) != null)
d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
break;
}
}
}
Outputs correctly:正确输出:
http://www.amerisourcebergendrug.com/abcdrug/
Are you sure that it isn't working.你确定它不起作用。 For me:
为了我:
System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());
.. outputs http://www.ibm.com/us/en/
correctly.. .. 正确输出
http://www.ibm.com/us/en/
..
to have a better error handling and case sensitivity problem有更好的错误处理和区分大小写的问题
try
{
Document doc = Jsoup.connect("http://www.ibm.com").get();
Elements meta = doc.select("html head meta");
if (meta != null)
{
String lvHttpEquiv = meta.attr("http-equiv");
if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh"))
{
String lvContent = meta.attr("content");
if (lvContent != null)
{
String[] lvContentArray = lvContent.split("=");
if (lvContentArray.length > 1)
doc = Jsoup.connect(lvContentArray[1]).get();
}
}
}
// get page title
return doc.title();
}
catch (IOException e)
{
e.printStackTrace();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.