简体   繁体   中英

Jsoup Scraping HTML dynamic content

I'm new to Jsoup and I have been trying to create a small code that gets the name of the items in a steam inventory using Jsoup.

public Element getItem(String user) throws IOException{
    Document doc;

    doc = Jsoup.connect("http://steamcommunity.com/id/"+user+"/inventory").get();
    Element element = doc.getElementsByClass("hover_item_name").first();
    return element;
}

this methods returns:

<h1 class="hover_item_name" id="iteminfo0_item_name"></h1>

and I want the information beetwen the "h1" labels which is generated when you click on a specific window. Thank you in advance.

You can use the .select(String cssQuery) method:

doc.select("h1") gives you all h1 Elements . If you need the actual Text in these tags use the .text() for each Element . If you need a attribute like class or id use .attr(String attributeKey) on a Element eg:

doc.getElementsByClass("hover_item_name").first().attr("id")

gives you "iteminfo0_item_name"

But if you need to perform clicks on a website you can't do that with JSoup, hence JSoup is a HTML parser and not a browser alternative. Jsoup can't handle dynamic content.

But what you could do is, firstly scrape the relevant data in your h1 tags and then send a new .post() request , respectively an ajax call

If you rather want a real webdriver, have a look at Selenium .

Use .text() and return a String , ie:

public String getItem(String user) throws IOException{
    Document doc;
    doc = Jsoup.connect("http://steamcommunity.com/id/"+user+"/inventory").get();
    Element element = doc.getElementsByClass("hover_item_name").first();
    String text = element.text();
    return text;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM