Scraping website search engine using BeautifulSoup

Question

I am trying to scrape the following website URL's search engine. However, I only get a fraction of the content back.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup
my_url = 'https://www.kvk.nl/zoeken/#!zoeken&q=ING&index=4&site=kvk2014&start=0'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# Data pull
page_soup = BeautifulSoup(page_html, "html.parser")

page_soup returns a couple of lines of href , and none of the information which is available on the my_url page. I am only really interested in the the first search result on the webpage, so the full name of the company: ING Bank NV, along with the remaining information for that company.

Answer 1

the real content is hidden in js file, such as :

https://zoeken.kvk.nl/search.ashx?callback=jQuery1124043501887376358495_1504000357055&q=ING&index=4&site=kvk2014&start=20&_=1504000357058

you should use chrome debug mode to check all the http requests and got the real data.

Scraping website search engine using BeautifulSoup

Question

1 answers

solution1
0 2017-08-29 09:54:38

Scraping website search engine using BeautifulSoup

Question

1 answers

solution1 0 2017-08-29 09:54:38

solution1
0 2017-08-29 09:54:38