简体   繁体   中英

Scraping website search engine using BeautifulSoup

I am trying to scrape the following website URL's search engine. However, I only get a fraction of the content back.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup
my_url = 'https://www.kvk.nl/zoeken/#!zoeken&q=ING&index=4&site=kvk2014&start=0'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# Data pull
page_soup = BeautifulSoup(page_html, "html.parser")

page_soup returns a couple of lines of href , and none of the information which is available on the my_url page. I am only really interested in the the first search result on the webpage, so the full name of the company: ING Bank NV, along with the remaining information for that company.

the real content is hidden in js file, such as :

https://zoeken.kvk.nl/search.ashx?callback=jQuery1124043501887376358495_1504000357055&q=ING&index=4&site=kvk2014&start=20&_=1504000357058

you should use chrome debug mode to check all the http requests and got the real data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM