简体   繁体   中英

How to query with time filters in GoogleScraper?

Even if Google's official API does not offer time information in the query results - even no time filtering for keywords, there is time filtering option in the advanced search:

Google results for stackoverflow in the last one hour

GoogleScraper library offers many flexible options BUT time related ones. How to add time features using the library?

After a bit of inspection, I've found that time Google sends the filtering information by qdr value to the tbs key (possibly means time based search although not officially stated):

https://www.google.com/search?tbs=qdr:h1&q=stackoverflow

This gets the results for the past hour. m and y letters can be used for months and years respectively.

Also, to add sorting by date feature, add the sbd (should mean sort by date ) value as well: https://www.google.com/search?tbs=qdr:h1,sbd:1&q=stackoverflow

I was able to insert these keywords to the BASE Google URL of GoogleScraper. Insert below lines to the end of get_base_search_url_by_search_engine() method (just before return ) in scraping.py :

if("google" in str(specific_base_url)):
    specific_base_url = "https://www.google.com/search?tbs=qdr:{},sbd:1".format(config.get("time_filter", ""))

Now use the time_filter option in your config:

from GoogleScraper import scrape_with_config

config = {
            'use_own_ip': True,
            'keyword_file': "keywords.txt",
            'search_engines': ['google'],
            'num_pages_for_keyword': 2,
            'scrape_method': 'http',
            "time_filter": "d15" #up to 15 days ago
        }

search = scrape_with_config(config)

Results will only include the time range. Additionally, text snippets in the results will have raw date information:

one_sample_result = search.serps[0].links[0]
print(one_sample_result.snippet)

4 mins ago It must be pretty easy - let propertytotalPriceOfOrder = order.items.map(item => +item.unit * +item.quantity * +item.price);. where order is your entire json object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM