简体   繁体   中英

Parse an URL and retrieve information

I need to extract the category of a Google Play app. For example, Facebook falls under the category "Social".

So I need to pull the information social from this link . I am able the get the HTML content in a String called "result" in the following code. But I am unable to find the tag which contains the category name. I am able to view the Category name when I do inspect element but not in code. How do I get the full html content of the above URL, the URL in code does not have the full HTML content. The category name is under html,head,Script,body,div,"Category Name".

When I read the complete HTML response, I only get the following tag elements: <html> , <head> , <script> , but I do not get the <body> element and its contents. Why is the body contents of the page not getting returned?

The following code outputs the HTML response of the queried page.

String url = "https://play.google.com/store/apps/details?id=com.kongregate.mobile.fly.google&hl=en";
InputStream inputStream = null;
String result = "";

try {

    // create HttpClient
    HttpClient httpclient = new DefaultHttpClient();

    // make GET request to the given URL
    HttpResponse httpResponse = httpclient.execute(new HttpGet(url));
    EntityUtils.toString(httpResponse.getEntity());
    inputStream = httpResponse.getEntity().getContent();

    // convert InputStream to String
    if (inputStream != null) {
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
        String line = "";

        while((line = bufferedReader.readLine()) != null) {
            result += line;
        }
    }
    // ...
} catch(...) {...}

Maybe this helps, the code returns the entire website as a Document:

org.jsoup.nodes.Document html = null;
try {
    html = Jsoup.connect(source).get();
} catch (final IOException e) {
    LOG.error(e.getMessage(), e);
}
LOG.info(html);

using Jsoup

I did not find your "Category Name" Node, but maybe you will again ;) you can search your the Document like that:

html.select("#Category Name");

more examples

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM