简体   繁体   中英

How to Retrieve 10 first Google Search Results Using Python Requests

I've seen lots of questions regarding this subject and i found out that Google has been updating the way its search engine APIs work.

This link > get the first 10 google results using googleapi shows EXACTLY what I need but the thing is I don't know if it's possible to do that anymore.

I need this to my term paper but by reading Google docs I couldn't find a way to do that. I've done the "get started" stuff and all I got was a private search engine using custom search engine (CSE).

Alternatively, you can use Python, Selenium and PhantomJS or other browsers to browse through Google's search results and grab the content. I haven't done that personally and don't know if there are challenges there.

I believe the best way would be to use their search APIs. Please try the one you pointed out. If it doesn't work, look for the new APIs.

I came across this question while trying to solve this problem myself and I found an updated solution to this.

Basically I used this guide at Google Custom Search to generate my own api key and search engine, then use python requests to retrieve the json results.

def search(query):
    api_key = 'MYAPIKEY'
    search_engine_id = 'MYENGINEID'
    url = "https://www.googleapis.com/customsearch/v1/siterestrict?key=%s&cx=%s&q=%s" % (api_key, search_engine_id, query)
    result = requests.Session().get(url)
    json = simplejson.loads(result.content)
    return json

I answered the question you attached via link.

Here's the link to that answer and full code example . I'll copy the code for faster access.


First-way using a custom script that returns JSON:

from bs4 import BeautifulSoup
import requests
import json

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.google.com/search?q=java&oq=java',
                    headers=headers).text

soup = BeautifulSoup(html, 'lxml')

summary = []

for container in soup.findAll('div', class_='tF2Cxc'):
    heading = container.find('h3', class_='LC20lb DKV0Md').text
    article_summary = container.find('span', class_='aCOpRe').text
    link = container.find('a')['href']

    summary.append({
        'Heading': heading,
        'Article Summary': article_summary,
        'Link': link,
    })

print(json.dumps(summary, indent=2, ensure_ascii=False))

Using Google Search Engine Results API from SerpApi:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "java",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
   print(f"Title: {result['title']}\nLink: {result['link']}\n")

Disclaimer, I work for SerpApi.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM