How to scrape page by page

Question

I wish to scrape pubMed however I found the url doesn't contain a page number.

For example, https://www.ncbi.nlm.nih.gov/pubmed?term=(cancer)%20AND%20(%222014%22%5BDate%20-%20Publication%5D%20%3A%20%222017%22%5BDate%20-%20Publication%5D) <--- this is the first page's url. However, if I click next page manually. https://www.ncbi.nlm.nih.gov/pubmed <--- next page.

Thus I can not scape by changing the page number.

What should I do to solve this problem?

Thanks~

Answer 1

You can specify a page number with a POST request:

The name of the element providing the value is:

EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Pubmed_Pager.cPage

If you're using curl, change the request to a POST and add the above key to the post data, setting it's value to whatever page you want. You might have to include some other values in the POST to have a valid request, but just inspect the source of the page to see what other values are expected.

How to scrape page by page

Question

1 answers

solution1
0 2017-05-06 12:22:57

How to scrape page by page

Question

1 answers

solution1 0 2017-05-06 12:22:57

solution1
0 2017-05-06 12:22:57