搜寻可能需要登录的网站-JSOUP

Question

I am trying to scrape a website that possibly requires authentication. 我正在尝试抓取可能需要身份验证的网站。 When I try the following code I get an error : 当我尝试以下代码时，出现错误：

org.jsoup.UnsupportedMimeTypeException: Unhandled content type. org.jsoup.UnsupportedMimeTypeException：未处理的内容类型。 Must be text/*, application/xml, or application/xhtml+xml. 必须为text / *，application / xml或application / xhtml + xml。 Mimetype=application/json; Mimetype = application / json; charset=utf-8, URL= https://sso.mims.com/Account/Signin at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:547) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205) at com.aiingov.proc.MedScraper.main(MedScraper.java:49) charset = utf-8，URL = org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:547）上的org.jsoup.helper.HttpConnection $ Response上的https://sso.mims.com/Account/Signin org.jsoup.helper.HttpConnection.execute（HttpConnection.java:205）的com.aiingov.proc.MedScraper.main（MedScraper.java:49）的.execute（HttpConnection.java:493）

public static void main(String[] args) throws IOException {

String url = "https://sso.mims.com/Account/Signin";
            String userAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36";

            Connection.Response response = Jsoup.connect(url).userAgent(userAgent)
                    .method(Connection.Method.GET)
                    .execute();

            response = Jsoup.connect(url)
                    .cookies(response.cookies())
                    .data("action", "login")
                    .data("login", "xxxxx")
                    .data("password", "xxxxx")
                    .data("auto_login", "1")
                    .userAgent(userAgent)
                    .method(Connection.Method.POST)
                    .followRedirects(true)
                    .execute();           

            Document document = Jsoup.connect("https://www.mims.com/india/drug/info/abacavir/abacavir?type=full&mtype=generic")
                    .cookies(response.cookies())
                    .userAgent(userAgent)
                    .get();

            System.out.println(document);

            Elements elements = document.body().select("*");

               for (Element element : elements) {
                   System.out.println(element.ownText());
               }

Without the login code in place I get the following output: 没有适当的登录代码，我将得到以下输出：

You will be redirected to your destination shortly. 您将很快被重定向到目的地。

How do I fix this? 我该如何解决？

Answer 1

Try using the ignoreContentType method. 尝试使用ignoreContentType方法。

 Jsoup.connect(url).ignoreContentType(true);//chain any other methods

Description of the method from the JSoup Docs: JSoup Docs中的方法说明：

Ignore the document's Content-Type when parsing the response. 解析响应时，忽略文档的Content-Type。 By default this is false, an unrecognised content-type will cause an IOException to be thrown. 默认情况下为false，无法识别的内容类型将引发IOException。 (This is to prevent producing garbage by attempting to parse a JPEG binary image, for example.) Set to true to force a parse attempt regardless of content type. （例如，这是为了防止通过尝试解析JPEG二进制图像而产生垃圾。）设置为true可以强制尝试进行解析，而与内容类型无关。

搜寻可能需要登录的网站-JSOUP

问题描述

1 个解决方案

解决方案1
0 2018-03-09 18:31:38

搜寻可能需要登录的网站-JSOUP

问题描述

1 个解决方案

解决方案1 0 2018-03-09 18:31:38

解决方案1
0 2018-03-09 18:31:38