I am trying to get links of restaurants but i can only get the first 30 and not all the others. Restaurants in Madrid Area are hundreads, the pagination only shows 30 in each page and the following code only get those 30
import re
import requests
from openpyxl import Workbook
from bs4 import BeautifulSoup as b
city_name = 'Madrid'
geo_code = '187514'
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
data = requests.get(
"https://www.tripadvisor.com//Restaurants-g{}-{}.html".format(geo_code, city_name), headers=headers
).text
for link in re.findall(r'"detailPageUrl":"(.*?)"', data):
print("https://www.tripadvisor.com.sg/" + link)
next_link = "https://www.tripadvisor.com.sg/" + link
f.write('%s\n' % next_link)
Found the solution, had to add ao with number of the result in the url like:
"https://www.tripadvisor.com//Restaurants-g{}-{}-{}.html".format(geo_code, city_name, n_review), headers=headers
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.