简体   繁体   中英

How to scrape page by page

I wish to scrape pubMed however I found the url doesn't contain a page number.

For example, https://www.ncbi.nlm.nih.gov/pubmed?term=(cancer)%20AND%20(%222014%22%5BDate%20-%20Publication%5D%20%3A%20%222017%22%5BDate%20-%20Publication%5D) <--- this is the first page's url. However, if I click next page manually. https://www.ncbi.nlm.nih.gov/pubmed <--- next page.

Thus I can not scape by changing the page number.

What should I do to solve this problem?

Thanks~

You can specify a page number with a POST request:

The name of the element providing the value is:

EntrezSystem2.PEntrez.PubMed.Pubmed_ResultsPanel.Pubmed_Pager.cPage

If you're using curl, change the request to a POST and add the above key to the post data, setting it's value to whatever page you want. You might have to include some other values in the POST to have a valid request, but just inspect the source of the page to see what other values are expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM