简体   繁体   中英

Bing Web Search API and blacklisting (python)

I'm using the Bing Web Search API to get URLs that match very specific queries. Unfortunately, there is also a lot of junk in the API results.

Now I created an extensive blacklist that covers approx. 70% of this "junk".

What is the most effective way to exclude that list of URLs from being appended to my "results" array?

the interesting part of the code:

results = []
try:
    conn = http.client.HTTPSConnection('api.cognitive.microsoft.com')
    conn.request("GET", "/bing/v5.0/search?%s" % params, "{body}", headers)
    response = conn.getresponse()
    data = response.read()
    json_file = json.loads(data)
    for i in range(len(json_file['webPages']['value'])):
        results.append([count, json_file['webPages']['value'][i]['displayUrl']])
    conn.close()
except Exception as e:
    print(e)

You can try bing custom search for this purpose. It allows you to restrict results to certain domains/subsites/webpages along with the blocking functionality. You can check details on customsearch.ai. Free access keys can be obtained from here: https://azure.microsoft.com/en-us/try/cognitive-services/?api=bing-custom-search .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM