Pycurl javascript

Question

I created a python 3 script that allows me to search on a search engine (DuckDuckGo), get the HTML source code and write it in a textfile.

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://duckduckgo.com/?q=test')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()

body = buffer.getvalue()
with open("output.htm", "w") as text_file:
    text_file.write(str(body))
print(body.decode('iso-8859-1'))

That part of the code is working properly. However, when I try to open the output.htm file containing the HTML source code of the search engine, I don't get anything (I get an input with my search topic written inside). I would like to have the same HTML source code that I would get by running curl https://duckduckgo.com/?q=test on my terminal.

Answer 1

Duckduckgo's html pages uses javascript to load their search result into their html markups, so curl or PyCurl will not be able to get the same html content you'd see in a browser since curl / pycurl merely fetches internet resources but does not provide any javascript processing.

Use https://duckduckgo.com/api instead of scraping to find search results in their servers/databases.

Pycurl javascript

Question

1 answers

solution1
1 2018-09-28 09:06:28

Pycurl javascript

Question

1 answers

solution1 1 2018-09-28 09:06:28

solution1
1 2018-09-28 09:06:28