简体   繁体   English

如何通过提供URL来查找网站中断开的链接,例如“ www.hammacher.com”

[英]How to find the broken links in a website by providing the URL say for example 'www.hammacher.com'

Am using the below code to find the broken links in a website. 我正在使用以下代码在网站中查找损坏的链接。 But if i want to find for the whole website inclusive of the internal links how can i do it? 但是,如果我想查找包含内部链接的整个网站,该怎么办? Please someone advise. 请有人指教。 Thank you 谢谢

To check the broken links in a web page 检查网页中断开的链接

List<WebElement> links = driver.findElements(By.tagName("a"));

    Iterator<WebElement> it = links.iterator();

    while(it.hasNext()){

        url = it.next().getAttribute("href");

        System.out.println(url);

        if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
            continue;
        }

        if(!url.startsWith(homePage)){
            System.out.println("URL belongs to another domain, skipping it.");
            continue;
        }

        try {
            huc = (HttpURLConnection)(new URL(url).openConnection());

            huc.setRequestMethod("HEAD");

            huc.connect();

            respCode = huc.getResponseCode();

            if(respCode >= 400){
                System.out.println(url+" is a broken link");
            }
            else{
                System.out.println(url+" is a valid link");
            }

        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

Your approach was perfecto. 您的方法是完美的。 To check the status of the links once you retrieve the href attributes from the <a> tags you can write a function which will accept the href as argument and print the relevant status as follows: 要从<a>标记检索href属性后检查链接的状态,可以编写一个函数,该函数接受href作为参数并按如下所示打印相关状态

  • Function to check the status of the links: 检查链接状态的功能:

     private void CheckingLink(String linkURL) { try { URL url = new URL(linkURL); HttpURLConnection httpUrlConnect = (HttpURLConnection) url.openConnection(); httpUrlConnect.setConnectTimeout(5000); httpUrlConnect.connect(); if (httpUrlConnect.getResponseCode() == 200) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 500) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 404) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == 402) { System.out.println(linkURL + " - " + httpUrlConnect.getResponseMessage()); } if (httpUrlConnect.getResponseCode() == httpUrlConnect.HTTP_NOT_FOUND) { System.out.println( linkURL + " - " + httpUrlConnect.getResponseMessage() + " - " + httpUrlConnect.HTTP_NOT_FOUND); } } catch (IOException e) { System.out.println(e.getMessage()); } } 
  • Calling the function CheckingLink() : 调用函数CheckingLink()

     List<WebElement> elements = driver.findElements(By.tagName("a")); System.out.println("Number of WebElements on this page : "+elements.size()); for (int i=0;i<elements.size();i++) { WebElement ele = elements.get(i); String url = ele.getAttribute("href"); CheckingLink(url); } 
  • The execution result on the url https://in.yahoo.com/?p=us produces the following output on the console: 网址https://in.yahoo.com/?p=us上的执行结果在控制台上产生以下输出:

     Number of WebElements on this page : 105 https://in.yahoo.com/ - OK https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK https://in.news.yahoo.com/ - OK https://cricket.yahoo.com/ - OK https://in.finance.yahoo.com/ - OK https://in.style.yahoo.com/tagged/celebrity - OK https://in.style.yahoo.com/tagged/movies - OK https://in.style.yahoo.com/ - OK https://in.mobile.yahoo.com/ - OK https://in.yahoo.com/everything/ - OK https://in.answers.yahoo.com/ - OK https://in.groups.yahoo.com/ - OK https://in.messenger.yahoo.com/ - OK https://in.news.yahoo.com/weather - OK https://in.yahoo.com/everything/world - OK https://in.yahoo.com/ - OK https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK https://login.yahoo.com/config/login?.src=fpctx&.intl=in&.lang=en-IN&.done=https%3A%2F%2Fin.yahoo.com - OK https://in.yahoo.com/?p=us#mega-bottombar-mail - OK https://in.yahoo.com/?p=us#Main - OK https://in.yahoo.com/?p=us#Aside - OK https://mail.yahoo.com/?.intl=in&.lang=en-IN&.partner=none&.src=fp - OK https://cricket.yahoo.com/ - OK https://in.news.yahoo.com/ - OK https://in.finance.yahoo.com/ - OK https://in.style.yahoo.com/ - OK https://in.style.yahoo.com/tagged/movies - OK https://in.style.yahoo.com/tagged/celebrity - OK http://in.travelinspirations.yahoo.com/ - OK https://in.yahoo.com/everything/ - OK https://in.news.yahoo.com/video/32-episode-1-095405056.html - OK https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK https://cricket.yahoo.net/scores/india-vs-afghanistan-oneoff-test-14th-june-2018-inaf06142018185950-summary - OK https://in.news.yahoo.com/fed-bengaluru-traffic-techie-rides-085447032.html - OK https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK https://in.style.yahoo.com/quick-look-actor-plays-race-slideshow-wp-102506088.html - OK https://in.style.yahoo.com/five-crucial-things-know-blood-103318158.html - OK https://in.news.yahoo.com/boy-america-contracts-bubonic-plague-113108819.html - OK https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK https://in.style.yahoo.com/janhvi-khushi-anshula-holiday-london-dad-boney-kapoor-064018621.html - OK https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=8j5uUzIGIS8bthoOIIlefINlCyUX0sMagCIuZQ05jmBfB74DwldI_rYOX1OS5kBByKf6VXv1ZfletO8DFuwVrss1EH7zcp7sC3mOkIDCDckHezCh6uetN9gABHeBIVJhY_Gh2YQZYlGcNjg0Ls4p9bZZt6jMNKDm_Deq0awAlb3iWN9MmuRf_3FnL8iztj2LLuB2G4qXUU5aZe_8bv54J3eChnAjgZEpXOjwZ0PX.aDMFrGxPY80WmXuIOd_k7ddLrVufsMXvVGZDkbqPaoyUidc2jukZlTGmbtJsq9PgokEscfHPYWw4KjDZT4js_9x74ME6IB.Pg3f6zuO1S6cb9kuc7WZ6wtRj73lilaXMuXv_mp5N7HB1USXa0Qy.S.PSZOX7kxczmPfD7znequq2Cova59KLDCDgj_kJM8zAGMKDrm7iWBTQuVlpY_lfv5YibTeKfJRtmJYnkJQ.XakDf6k6gOLWmWkJjuA9pVDUZKkMrCXwY8yRInyKIoMPMdPDa4kRIh1ghW2K7VLJfjGu6qXW1kPGFVRTF0wKkN4JKY4J.TLPlSEI9uuudXnam8OY5RZJA--%26lp= - OK unknown protocol: javascript https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK https://in.finance.yahoo.com/news/salman-khan-katrina-kaif-sonakshi-052512176.html - OK https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK https://in.news.yahoo.com/rihanna-narrowly-avoids-wardrobe-malfunction-135255635.html - OK https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK https://in.style.yahoo.com/dipika-kakar-set-first-eid-marriage-green-sharara-052512000.html - OK https://info.yahoo.com/privacy/us/yahoo/relevantads.html - OK unknown protocol: javascript https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK https://in.style.yahoo.com/neha-kakkar-apologises-her-man-himansh-kohli-rude-073156251.html - OK https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK https://in.news.yahoo.com/alia-bhatt-apos-sister-shaheen-031551577.html - OK https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK https://in.news.yahoo.com/apos-why-love-island-contestants-183329153.html - OK https://in.search.yahoo.com/search?p=India%20vs%20Afghanistan%202018&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Bajrang%20Dal%20VHP%20CIA&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Shujaat%20Bukhari&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Dhivya%20Suryadevara&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Luxury%20watches&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=FIFA%20World%20Cup%202018&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=UN%20Kashmir%20report&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=AAP%20dharna&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Sanju%20poster&fr=fp-tts&fr2=ps - OK https://in.search.yahoo.com/search?p=Race%203&fr=fp-tts&fr2=ps - OK https://weather.yahoo.com/ - OK https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK https://in.news.yahoo.com/weather/in/maharashtra/pune-2295412/ - OK null null null https://cricket.yahoo.com/ - OK https://cricket.yahoo.com/ - OK https://cricket.yahoo.com/ - OK no protocol: https://in.news.yahoo.com/ - OK https://in.style.yahoo.com/bengalureans-force-bbmp-re-look-bizarre-new-pet-licensing-bye-laws-notwithoutmydog-movement-095558668.html - OK https://in.news.yahoo.com/photos-eid-ul-fitr-celebrations-slideshow-wp-095013253.html - OK https://in.news.yahoo.com/photos-football-frenzy-grips-russia-slideshow-wp-085232287.html - OK https://policies.yahoo.com/in/en/yahoo/privacy/index.htm - OK http://in.advertising.yahoo.com/ - OK careers.yahoo.com https://in.help.yahoo.com/kb/helpcentral - OK https://yahoo.uservoice.com/forums/206294-india-homepage - OK PASSED: getLinks =============================================== Default test Tests run: 1, Failures: 0, Skips: 0 =============================================== =============================================== Default suite Total tests run: 1, Failures: 0, Skips: 0 =============================================== 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何更换<a href='www.example.com'>示例:</a> ? - How to replace <a href='www.example.com'>Example:</a>? 如何仅从 URL 字符串中提取网站名称而不是 www。 和 .com - How to Extract only the website name from a URL String not www. and .com with it Selenium WebDriver 中的 URL 格式错误使用 Java 查找损坏的链接 - URL Malformed exception in Selenium WebDriver using Java to find the broken links 如何查看网址http://www.test.com/abc.pdf是文件还是目录? - how to find out if the url http://www.test.com/abc.pdf is a file or a directory? 从网站下载带有查询字符串的文件(例如www.example.com/files?id=123) - download file from website that take query string (ex. www.example.com/files?id=123) 如何使用带有 Java 的 Selenium WebDriver 查找损坏的链接 - How to find broken links using Selenium WebDriver with Java 如何使用硒查找网站上所有链接的目录? - How to find directories of all the links on a website using selenium? 为www.example.com而不是example.com设置cookie - cookie being set for www.example.com instead of example.com 如何将我在tomcat 8080端口上运行的应用程序URL重定向到域www.xyz.com - how to redirect my application URL running on tomcat 8080 port to the domain www.xyz.com 在PlayFramework 1.2.x中将example.com重定向到www.example.com - Redirect example.com to www.example.com in PlayFramework 1.2.x
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM