I'm using the Bing Web Search API to get URLs that match very specific queries. Unfortunately, there is also a lot of junk in the API results.
Now I created an extensive blacklist that covers approx. 70% of this "junk".
What is the most effective way to exclude that list of URLs from being appended to my "results" array?
the interesting part of the code:
results = []
try:
conn = http.client.HTTPSConnection('api.cognitive.microsoft.com')
conn.request("GET", "/bing/v5.0/search?%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
json_file = json.loads(data)
for i in range(len(json_file['webPages']['value'])):
results.append([count, json_file['webPages']['value'][i]['displayUrl']])
conn.close()
except Exception as e:
print(e)
You can try bing custom search for this purpose. It allows you to restrict results to certain domains/subsites/webpages along with the blocking functionality. You can check details on customsearch.ai. Free access keys can be obtained from here: https://azure.microsoft.com/en-us/try/cognitive-services/?api=bing-custom-search .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.