简体   繁体   中英

Scraping in R, aspx form don't know how to get data

Hello Im traying to scrape data from https://eservicios2.aguascalientes.gob.mx/sop/geobras/UI/frmObrasTodas.aspx

I can get the data from the main page but I don't know how to get the data from the form,

a) when choose a row and ask for "Detalle" , means detail goes to a form.
b) don't know how to follow the link

Need to get data from each row, can anybody help me.

the main issue and problem is that this is a asp.net web site. So, when you select a row, this likely uses a server side event. you MIGHT be able to write some JavaScript to select a row. But then the next issue is even more of a challenge. Once you select a row, then you have to click on a button. That button is going to run server side code. And that server side code is going to look at and grab the row value selected - VERY likely again server side code. Unlike say a simple web site with hyper-links?

.net sides are full driven from vb.net or c# code. We don't use silly things like hyper-links, or even silly parameters in the web URL.

So, after you select a row (perhaps possible in js), then you would then have to click on the details button. This again can be done with JavaScript

Say, in jQuery like this:

$('#NameOfButton').click();

So asp.net sites don't use simple code like what you see and get from someone who take that 3 day web developer program promising that now you are a experienced web developer. Asp.net sites as a result don't use simple HTML markup code and things like a simple hyper-link to drive the web site. There are no "links" for each row, but only code on the server side that runs to pull the data from the database, and then render that information, and THEN send it down as a html markup.

The bottom line? The site is not simply HTML and simple hyper-links that you click on. When you click on that button, then the code behind (written in a nice language like c# or vb.net) runs. There is thus no markup code or even JavaScript code that is required here. You talking about clean and nice server side code. (and code written in a fantastic IDE - Visual Studio).

This means that aspx web sites are code behind driven, and as a result they are rather difficult to web scape in a automated fashion. You can get/grab the page you are on, but since there are no hyper-links to the additonal data (such as details), then you don't have a simple URL to follow/trace here.

Worse yet, the setup code (what occurs when you selected a single row) also in most cases has to be run. Only if all values are setup 100% correctly BEFORE hitting the "details" button will this thus work. And even worse, if you note, on the details page, there is no parameters in the URL. This means that not only is code behind required to run BEFORE the 2nd details page launches, but the correct setup code behind has to run. And even worse yet, is the 2nd page URL VERY likely also checks and ensures that the previous URL page was from the same site (as a result you can NOT JUST type in a url for the 2nd page - it will not work.

And in fact, if you look even closer? When you hit details button, the web pages re-loads, re-plots and renders what is CLEARLY a whole new web page and layout.

But note how the URL DOES NOT change!!! They are NOT even using a iframe for this.

This is because they are using what is called a server side re-direct. The key "tell tell" sign is that the URL remains the same, but the whole page layout is 100% different. What occurred is the server side did a re-direct to a 100% whole new page. But since the browser did not and was not causing this navigation? The code behind actually loads + displays a whole new web page and sends it down to the web client side.

However, note how the URL remains the same!!! This is due to the code behind is loading + displaying a whole new different web page - but since the navigation to that new page occurred with server side code?

Well then the server can load + send out anything it wants to the client - include a whole new web page, and you don't get nor see a web url change.

Again, this is typical of asp.net systems in which server side code drives the web site, and not much client side code.

You "might" be able to automate scraping. But you would need some custom code to select a given row, and then some code to click the details button. And that's going to be a REAL challenge, since any changes to the web page code (by you) also tend to be check for, and not allow server side.

The only practical web scrape approach would be to use some desktop tools to create a WHOLE instance of the web browser, let you the user navigate to the given web page that displays the data, and then hit some "capture" button in your application that now reads and parses out the data like you doing now for the main page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM