简体   繁体   English

如何在GoogleScraper中使用时间过滤器进行查询?

[英]How to query with time filters in GoogleScraper?

Even if Google's official API does not offer time information in the query results - even no time filtering for keywords, there is time filtering option in the advanced search: 即使Google的官方API不在查询结果中提供时间信息-即使没有针对关键字的时间过滤,高级搜索中也有时间过滤选项:

Google results for stackoverflow in the last one hour Google在过去一小时内发现了stackoverflow

GoogleScraper library offers many flexible options BUT time related ones. GoogleScraper库提供了许多与时间相关的灵活选项。 How to add time features using the library? 如何使用库添加时间功能?

After a bit of inspection, I've found that time Google sends the filtering information by qdr value to the tbs key (possibly means time based search although not officially stated): 经过一番检查后,我发现Google将qdr值的过滤信息发送到tbs键的time based search (可能是time based search尽管未正式说明):

https://www.google.com/search?tbs=qdr:h1&q=stackoverflow https://www.google.com/search?tbs=qdr:h1&q=stackoverflow

This gets the results for the past hour. 这将获取过去一个小时的结果。 m and y letters can be used for months and years respectively. my字母分别可以使用几个月和几年。

Also, to add sorting by date feature, add the sbd (should mean sort by date ) value as well: https://www.google.com/search?tbs=qdr:h1,sbd:1&q=stackoverflow 另外,要添加按日期排序功能,请同时添加sbd (应表示sort by date )值: https : //www.google.com/search? sbd , sbd :1 & sbd

I was able to insert these keywords to the BASE Google URL of GoogleScraper. 我能够将这些关键字插入GoogleScraper的BASE Google URL。 Insert below lines to the end of get_base_search_url_by_search_engine() method (just before return ) in scraping.py : 在下面插入线的端部get_base_search_url_by_search_engine()方法(只是之前return在) scraping.py

if("google" in str(specific_base_url)):
    specific_base_url = "https://www.google.com/search?tbs=qdr:{},sbd:1".format(config.get("time_filter", ""))

Now use the time_filter option in your config: 现在,在您的配置中使用time_filter选项:

from GoogleScraper import scrape_with_config

config = {
            'use_own_ip': True,
            'keyword_file': "keywords.txt",
            'search_engines': ['google'],
            'num_pages_for_keyword': 2,
            'scrape_method': 'http',
            "time_filter": "d15" #up to 15 days ago
        }

search = scrape_with_config(config)

Results will only include the time range. 结果将仅包括时间范围。 Additionally, text snippets in the results will have raw date information: 此外,结果中的文本片段将具有原始日期信息:

one_sample_result = search.serps[0].links[0]
print(one_sample_result.snippet)

4 mins ago It must be pretty easy - let propertytotalPriceOfOrder = order.items.map(item => +item.unit * +item.quantity * +item.price);. 4分钟前这一定很容易-让propertytotalPriceOfOrder = order.items.map(item => + item.unit * + item.quantity * + item.price);。 where order is your entire json object. 其中order是您的整个json对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM