简体   繁体   中英

How can I get the first results of a Google Search that is not an ad using python?

I am trying to get the financial statements of a bunch of Australian companies as pdfs. I have all the companies stored in a pandas dataframe, their company names are in a column called 'Companies' This is my code so far to search for the urls:

import webbrowser

tabUrl = "http://google.com/?#q="
append = "+financial+report+2017"
file_type = 'filetype%3Apdf+'

for company in data["Company"]:
        googleSearch = tabUrl + file_type + company.replace(" ", "+") + append
        print(googleSearch)

Every search returns (unsurprisingly) a number of ads as the first result. How do I open the first result that is not an ad?

Thanks!

Right now you are sending request to the google webpage url and the results displayed would contain the ads that you see on google if you go to https://www.google.com

A better way to do this would be to use google Custom Search API to send your requests and get the results. You can get the documentation here: https://developers.google.com/custom-search/json-api/v1/using_rest

From their documentation, you see that you can make REST requests to their service end point once you generate your API KEY and Custom search engine ID

GET https://www.googleapis.com/customsearch/v1?key=INSERT_YOUR_API_KEY&cx=017576662512468239146:omuauf_lfve&q=lectures

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM