How can I extract web app content from html code?

Question

So I'm currently trying to gather data from csgo gambling sites to analyze them. So I wrote a very short programm extracting the html code from this website but it won't extract the content of the web app. My problem now is that I need the information within this web app. I mean I can view it in Chrome so I guess there will be solution. Maybe the pictures help to understand what I'm looking for:

HTML code; marked the line I want

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;

public class Main {

    public static void main(String[] args) {
        
        try {
            
            String html = Jsoup.connect("https://www.wtfskins.com/crash").get().html();
            System.out.println(html);
            
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

So that's what I get. I need the content of

<body> <app-root> 
  loading... // That's the problem
 </app-root> 
 <script src="https://code.jquery.com/jquery-3.1.1.min.js" integrity="sha256-hVVnYaiADRTO2PzUGmuLJr8BLUSjGIZsDYGmIJLv2b8=" crossorigin="anonymous"></script> 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/tether/1.4.0/js/tether.min.js" integrity="sha384-DztdAPBWPRXSA/3eYEEUWrWCy7G5KFbe8fFjk5JAIxUYHKkDx6Qin1DkWx51bBrb" crossorigin="anonymous"></script> 
 <script src="/assets/js/jquery-ui.min.js"></script> 
 <script src="/assets/js/bootstrap.js"></script> 
 <script src="/assets/js/sha3.js"></script> 
 <script src="/assets/js/sha256.js"></script> 
 <script type="text/javascript" src="inline.318b50c57b4eba3d437b.bundle.js"></script> 
 <script type="text/javascript" src="polyfills.2b75d68d2d6cb678fc8d.bundle.js"></script> 
 <script type="text/javascript" src="main.7932c68952979c366236.bundle.js"></script>  
</body>

Answer 1

The data is loaded in the page after the initial DOM. When you are getting data with JSoup , you get the initial html request.

This image shows that the html request really gives kinda empty html structure

If you check the Network tab in the dev tools in the browser, you will see that after the initial load there will be extra XHR requests, getting the data. ngcontent attributes of tags assure that the page is loaded using Angular , which is a Javascript framework.
This is done to make page loads more efficient and protect from the scraping a bit more.

AFTER CHECKING

The network tab shows multiple requests after the page load that have JSON responses. You need to look at those, see which request headers are mandatory to request them. As image shows, one of interesting ones is: https://www.wtfskins.com/api/v1/p2ptrading/usertrades/

You can start by looking at How the Web works with subcategories about Async Javascript requests and REST API basics as well. If you are not familiar with web dev, the research will take a bit of time.

How can I extract web app content from html code?

Question

1 answers

solution1
0 2021-03-01 19:00:49

This image shows that the html request really gives kinda empty html structure

AFTER CHECKING

How can I extract web app content from html code?

Question

1 answers

solution1 0 2021-03-01 19:00:49

This image shows that the html request really gives kinda empty html structure

AFTER CHECKING

solution1
0 2021-03-01 19:00:49