I am using Selenium for some web-scraping activities, and I really feel the need to limit data consumption by blocking specific file types or filenames from being downloaded. I wish to block them by regex filters, like:
*.MP4
*.css
*ads.google.com*
So far I have not found any solutions and I am looking forward for a JavaScript one, if possible...
I have found the solution to be achievable by mediating a Chrome Extension middleware.
Particularily, in background-scripts
, you could use onBeforeRequests
to handle and filter each single request
chrome.webRequest.onBeforeRequest.addListener(
function(info) {
return {cancel: info.url.toLowerCase().includes('.css') || info.url.toLowerCase().includes('.gif') || info.url.toLowerCase().includes('.png') || info.url.toLowerCase().includes('.jpg') || info.url.toLowerCase().includes('.jpeg') || info.url.toLowerCase().includes('.webm') || info.url.toLowerCase().includes('.webp') ||info.url.toLowerCase().includes('.mp4') || info.url.toLowerCase().includes('allHeaderNonBlocking.js') || info.url.toLowerCase().includes('allHeader.js?') || info.url.toLowerCase().includes('/analytics.js') || info.url.toLowerCase().includes('googletagmanager') || info.url.toLowerCase().includes('calleo-livechat') || info.url.toLowerCase().includes('.svg') };
},
{
urls: ["<all_urls>"]
},
["blocking"]
);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.