简体   繁体   English

线程“main”中的异常 java.net.MalformedURLException:使用 Selenium 和 Java 在页面中查找断开的链接时没有协议错误

[英]Exception in thread "main" java.net.MalformedURLException: no protocol error while finding broken links in a page using Selenium and Java

I am trying to find the broken link in a page through Selenium(Java) code but I am facing this issue.我试图通过 Selenium(Java) 代码在页面中找到断开的链接,但我正面临这个问题。 I am not able to run this code due to the below exception.由于以下异常,我无法运行此代码。 In this code, the total number of links in a page is found then the URL of links is found.在此代码中,找到页面中的链接总数,然后找到链接的 URL。 Please see the issue and give me the resolution for this.请查看问题并给我解决方案。

Exception in thread "main" java.net.MalformedURLException: no protocol: 
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at fire.Weil.main(Weil.java:57)

My code is: -我的代码是:-

package fire;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Weil {

    public static void main(String[] args) throws MalformedURLException, IOException{

        System.setProperty("webdriver.gecko.driver", "C:\\Users\\sumitk\\Downloads\\Selenium Drivers\\Gecodriver\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        //delete all cookies
        driver.manage().deleteAllCookies();

        //dynamic wait
        driver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS);
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);

        //open site
        driver.get("https://www.weil.com/");

        //1. get the list of all the links and images
        List<WebElement> linklist = driver.findElements(By.tagName("a"));
        linklist.addAll(driver.findElements(By.tagName("img")));

        System.out.println("Size of full links and images--->"+ linklist.size());

        List<WebElement> activeLinks =new ArrayList<WebElement>();

        // 2. iterate linklist : exclude all the links/images does not have any href attribute
        for(int i=0; i<linklist.size(); i++)
        {
            System.out.println(linklist.get(i).getAttribute("href"));
            if(linklist.get(i).getAttribute("href") !=null)
            {
                activeLinks.add(linklist.get(i));
            }
        }

        //get the size of active links list.
        System.out.println("Size of active links and images--->"+ activeLinks.size());

        //3. check the href url, with httpconnection api.
        for(int j=0; j<activeLinks.size(); j++)
        {
            HttpURLConnection connection=(HttpURLConnection) new URL(activeLinks.get(j).getAttribute("href")).openConnection();
            connection.connect();
            String response=connection.getResponseMessage();
            connection.disconnect();
            System.out.println(activeLinks.get(j).getAttribute("href") +" --->"+response);
        }
    }

}

This error message...这个错误信息...

Exception in thread "main" java.net.MalformedURLException: no protocol:

...implies that your program was trying to access an URL which doesn't have a protocol ie HTTP or HTTPS is absent. ...暗示您的程序试图访问一个没有协议的URL ,即没有HTTPHTTPS

Your logic was near perfect.你的逻辑近乎完美。 A few words:几句话:

  • It may be possible that some of the <a> elements within the webpage https://www.weil.com/ have href attribute have no value assigned.网页https://www.weil.com/中的某些<a>元素可能具有href属性没有分配值。 As an example:举个例子:

    • <a class="canvas-button ss-icon" href="">?</a>
    • <a class="search-button ss-icon" href="">Search</a>
  • That is the reason this line:这就是这条线的原因:

     System.out.println("Size of active links and images--->"+ activeLinks.size()); //prints: Size of active links and images--->72
  • But if you print the href attribute:但是如果你打印href属性:

     for(int i=0; i<activeLinks.size(); i++) System.out.println(activeLinks.get(i).getAttribute("href"));
  • The first two lines are blank as follows:前两行空白如下:

     <blank> <blank> https://www.weil.com/ https://www.weil.com/ https://www.weil.com/people
  • I made a couple of simple tweaks in your code as follows:我在你的代码中做了一些简单的调整,如下所示:

    • Replaced findElements(By.tagName("a")) with findElements(By.xpath("//a[contains (@href, 'weil')]"))findElements(By.tagName("a"))替换为findElements(By.xpath("//a[contains (@href, 'weil')]"))
    • Replaced findElements(By.tagName("img")) with findElements(By.xpath("//img[contains (@src, 'weil')]"))findElements(By.tagName("img"))替换为findElements(By.xpath("//img[contains (@src, 'weil')]"))
  • Here is the execution result:下面是执行结果:

    • Code Block:代码块:

       public class A_Chrome_Demo { public static void main(String[] args) throws IOException { System.setProperty("webdriver.chrome.driver", "C:\\\\Utility\\\\BrowserDrivers\\\\chromedriver.exe"); ChromeOptions options = new ChromeOptions(); options.addArguments("start-maximized"); options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation")); options.setExperimentalOption("useAutomationExtension", false); WebDriver driver = new ChromeDriver(options); driver.get("https://www.weil.com/"); List<WebElement> linklist = driver.findElements(By.xpath("//a[contains (@href, 'weil')]")); linklist.addAll(driver.findElements(By.xpath("//img[contains (@src, 'weil')]"))); System.out.println("Size of full links and images--->"+ linklist.size()); List<WebElement> activeLinks =new ArrayList<WebElement>(); for(int i=0; i<linklist.size(); i++) { System.out.println(linklist.get(i).getAttribute("href")); if(linklist.get(i).getAttribute("href") !=null) activeLinks.add(linklist.get(i)); } System.out.println("Size of active links and images--->"+ activeLinks.size()); for(int j=0; j<activeLinks.size(); j++) { HttpURLConnection connection=(HttpURLConnection) new URL(activeLinks.get(j).getAttribute("href")).openConnection(); connection.connect(); String response=connection.getResponseMessage(); connection.disconnect(); System.out.println(activeLinks.get(j).getAttribute("href") +" --->"+response); } } }
    • Console Output:控制台输出:

       Size of full links and images--->46 https://www.weil.com/about-weil https://extranet.weil.com/ https://login.weil.com/ https://www.weil.com/articles/weil-elects-16-new-partners-and-announces-new-counsel-class-2019 https://www.weil.com/articles/weil-announces-weil-legal-innovators-program https://www.weil.com/articles/weil-partners-receive-top-honors-in-2019 https://www.weil.com/articles/two-weil-partners-named-among-turnarounds-workouts-outstanding-restructuring-lawyers-for-2019 https://careers.weil.com/ https://www.weil.com/articles/weil-wins-five-2019-law360-practice-group-of-the-year-awards https://www.weil.com/articles/weil-earns-2020-litigation-department-of-the-year-honorable-mention-from-the-american-lawyer https://www.weil.com/articles/weil-leads-three-of-the-five-top-bankruptcy-cases-of-2019 https://www.weil.com/about-weil/about-weil-prominent-matters https://www.weil.com/articles/weil-represented-french-state-in-landmark-privatization-and-ipo-of-francaise-des-jeux https://www.weil.com/articles/weil-litigators-clinch-four-win-week-showcasing-cross-departmental-strengths https://www.weil.com/articles/weil-advised-guggenheim-securities-and-morgan-stanley-on-jack-in-the-boxs-1-3b-securitization https://www.weil.com/about-weil/not-for-profit https://www.weil.com/articles/weil-secures-asylum-for-burkina-faso-native-escaping-persecution https://www.weil.com/articles/weils-2019-pro-bono-annual-review-our-finest-hours https://www.weil.com/articles/weil-and-nysba-task-force-deliver-report-on-wrongful-convictions-in-new-york-state https://www.weil.com/about-weil/diversity-and-inclusion https://www.weil.com/articles/weil-named-a-2020-best-place-to-work-for-lgbtq-equality https://www.weil.com/articles/three-weil-partners-named-best-practitioners-in-their-fields http://business-finance-restructuring.weil.com/ http://eurorestructuring.weil.com/ http://privateequity.weil.com/ http://governance.weil.com/ http://product-liability.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/latest-thinking/cryptoassets-hmrc-uk-tax-net-widens/ http://business-finance-restructuring.weil.com/automatic-stay/denial-of-stay-relief-is-a-final-order-says-the-us-supreme-court/ http://business-finance-restructuring.weil.com/news/weil-wins-five-2019-law360-practice-group-of-the-year-awards/ https://www.weil.com/about-weil/green-policy https://www.weil.com/about-weil/sitemap https://www.weil.com/about-weil/privacy-policy https://www.weil.com/about-weil/privacy-shield-notice https://www.weil.com/about-weil/regulatory-information https://www.weil.com/about-weil/disclaimer null null null Size of active links and images--->43 https://www.weil.com/about-weil --->OK https://extranet.weil.com/ --->OK https://login.weil.com/ --->OK https://www.weil.com/articles/weil-elects-16-new-partners-and-announces-new-counsel-class-2019 --->OK https://www.weil.com/articles/weil-announces-weil-legal-innovators-program --->OK https://www.weil.com/articles/weil-partners-receive-top-honors-in-2019 --->OK https://www.weil.com/articles/two-weil-partners-named-among-turnarounds-workouts-outstanding-restructuring-lawyers-for-2019 --->OK https://careers.weil.com/ --->OK https://www.weil.com/articles/weil-wins-five-2019-law360-practice-group-of-the-year-awards --->OK https://www.weil.com/articles/weil-earns-2020-litigation-department-of-the-year-honorable-mention-from-the-american-lawyer --->OK https://www.weil.com/articles/weil-leads-three-of-the-five-top-bankruptcy-cases-of-2019 --->OK https://www.weil.com/about-weil/about-weil-prominent-matters --->OK https://www.weil.com/articles/weil-represented-french-state-in-landmark-privatization-and-ipo-of-francaise-des-jeux --->OK https://www.weil.com/articles/weil-litigators-clinch-four-win-week-showcasing-cross-departmental-strengths --->OK https://www.weil.com/articles/weil-advised-guggenheim-securities-and-morgan-stanley-on-jack-in-the-boxs-1-3b-securitization --->OK https://www.weil.com/about-weil/not-for-profit --->OK https://www.weil.com/articles/weil-secures-asylum-for-burkina-faso-native-escaping-persecution --->OK https://www.weil.com/articles/weils-2019-pro-bono-annual-review-our-finest-hours --->OK https://www.weil.com/articles/weil-and-nysba-task-force-deliver-report-on-wrongful-convictions-in-new-york-state --->OK https://www.weil.com/about-weil/diversity-and-inclusion --->OK https://www.weil.com/articles/weil-named-a-2020-best-place-to-work-for-lgbtq-equality --->OK https://www.weil.com/articles/three-weil-partners-named-best-practitioners-in-their-fields --->OK http://business-finance-restructuring.weil.com/ --->Forbidden http://eurorestructuring.weil.com/ --->Forbidden http://privateequity.weil.com/ --->Forbidden http://governance.weil.com/ --->Forbidden http://product-liability.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/latest-thinking/cryptoassets-hmrc-uk-tax-net-widens/ --->Forbidden http://business-finance-restructuring.weil.com/automatic-stay/denial-of-stay-relief-is-a-final-order-says-the-us-supreme-court/ --->Forbidden http://business-finance-restructuring.weil.com/news/weil-wins-five-2019-law360-practice-group-of-the-year-awards/ --->Forbidden https://www.weil.com/about-weil/green-policy --->OK https://www.weil.com/about-weil/sitemap --->OK https://www.weil.com/about-weil/privacy-policy --->OK https://www.weil.com/about-weil/privacy-shield-notice --->OK https://www.weil.com/about-weil/regulatory-information --->OK https://www.weil.com/about-weil/disclaimer --->OK

Reference参考

You can find a relevant detailed discussion in:您可以在以下位置找到相关的详细讨论:

This is because the Web Page contains 'a' Tag Elements with no reference to the href keyword.这是因为网页包含没有引用 href 关键字的“a”标记元素。

ie Top Left-most List-Drawer Icon and Search Icon.即最左上角的列表抽屉图标和搜索图标。

Refer the attached image.请参阅附图。

在此处输入图片说明

Using a try catch block for java.net.MalformedURLException could possibly help you out and would allow you to move ahead with the desired flow.对 java.net.MalformedURLException 使用 try catch 块可能会帮助您解决问题,并允许您继续进行所需的流程。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM