简体   繁体   中英

Extract complete URL from a link

I am scrapping amanzon.co.in using scrapy-playwright. I am able to extract description, rating and price of desired item. However for going to next page I want to extract href for Next Page button at the bottom of the page.

Thru scrapy-playwright python code I am able to extract href of next button as: href="/s?k=Soap+for+men&page=2"

When I extract URL using the browser, it appears like: https://www.amazon.in/s?k=soap+for+men&page=2&crid=1A43B14UY65X0&qid=1671472636&sprefix=soap+for+men%2Caps%2C262&ref=sr_pg_1

How do I get generate complete URL from the link including crid extracted thru code?

crid, qid and sprefix are query parameters to specify additional information about the request being made to the server.

crid: This stands for "customer request ID". It is a unique identifier that is generated by Amazon to track customer requests.

qid: This stands for "query ID". It is a unique identifier that is generated by Amazon to track search queries.

sprefix: This stands for "search prefix". It specifies the prefix for the search query, which can be used to refine the search results.

These query parameters are used by Amazon to track and optimize the performance of their search function. They do not necessarily have any meaning to the user or the content of the page being requested. You can run your spider without these query parameters and it won't make any differance to the output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM