简体   繁体   中英

Pycurl javascript

I created a python 3 script that allows me to search on a search engine (DuckDuckGo), get the HTML source code and write it in a textfile.

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://duckduckgo.com/?q=test')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()

body = buffer.getvalue()
with open("output.htm", "w") as text_file:
    text_file.write(str(body))
print(body.decode('iso-8859-1'))

That part of the code is working properly. However, when I try to open the output.htm file containing the HTML source code of the search engine, I don't get anything (I get an input with my search topic written inside). I would like to have the same HTML source code that I would get by running curl https://duckduckgo.com/?q=test on my terminal.

Duckduckgo's html pages uses javascript to load their search result into their html markups, so curl or PyCurl will not be able to get the same html content you'd see in a browser since curl / pycurl merely fetches internet resources but does not provide any javascript processing.

Use https://duckduckgo.com/api instead of scraping to find search results in their servers/databases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM