简体   繁体   中英

Exception in thread "main" java.net.MalformedURLException: no protocol error while finding broken links in a page using Selenium and Java

I am trying to find the broken link in a page through Selenium(Java) code but I am facing this issue. I am not able to run this code due to the below exception. In this code, the total number of links in a page is found then the URL of links is found. Please see the issue and give me the resolution for this.

Exception in thread "main" java.net.MalformedURLException: no protocol: 
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at fire.Weil.main(Weil.java:57)

My code is: -

package fire;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Weil {

    public static void main(String[] args) throws MalformedURLException, IOException{

        System.setProperty("webdriver.gecko.driver", "C:\\Users\\sumitk\\Downloads\\Selenium Drivers\\Gecodriver\\geckodriver.exe");
        WebDriver driver = new FirefoxDriver();

        //delete all cookies
        driver.manage().deleteAllCookies();

        //dynamic wait
        driver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS);
        driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);

        //open site
        driver.get("https://www.weil.com/");

        //1. get the list of all the links and images
        List<WebElement> linklist = driver.findElements(By.tagName("a"));
        linklist.addAll(driver.findElements(By.tagName("img")));

        System.out.println("Size of full links and images--->"+ linklist.size());

        List<WebElement> activeLinks =new ArrayList<WebElement>();

        // 2. iterate linklist : exclude all the links/images does not have any href attribute
        for(int i=0; i<linklist.size(); i++)
        {
            System.out.println(linklist.get(i).getAttribute("href"));
            if(linklist.get(i).getAttribute("href") !=null)
            {
                activeLinks.add(linklist.get(i));
            }
        }

        //get the size of active links list.
        System.out.println("Size of active links and images--->"+ activeLinks.size());

        //3. check the href url, with httpconnection api.
        for(int j=0; j<activeLinks.size(); j++)
        {
            HttpURLConnection connection=(HttpURLConnection) new URL(activeLinks.get(j).getAttribute("href")).openConnection();
            connection.connect();
            String response=connection.getResponseMessage();
            connection.disconnect();
            System.out.println(activeLinks.get(j).getAttribute("href") +" --->"+response);
        }
    }

}

This error message...

Exception in thread "main" java.net.MalformedURLException: no protocol:

...implies that your program was trying to access an URL which doesn't have a protocol ie HTTP or HTTPS is absent.

Your logic was near perfect. A few words:

  • It may be possible that some of the <a> elements within the webpage https://www.weil.com/ have href attribute have no value assigned. As an example:

    • <a class="canvas-button ss-icon" href="">?</a>
    • <a class="search-button ss-icon" href="">Search</a>
  • That is the reason this line:

     System.out.println("Size of active links and images--->"+ activeLinks.size()); //prints: Size of active links and images--->72
  • But if you print the href attribute:

     for(int i=0; i<activeLinks.size(); i++) System.out.println(activeLinks.get(i).getAttribute("href"));
  • The first two lines are blank as follows:

     <blank> <blank> https://www.weil.com/ https://www.weil.com/ https://www.weil.com/people
  • I made a couple of simple tweaks in your code as follows:

    • Replaced findElements(By.tagName("a")) with findElements(By.xpath("//a[contains (@href, 'weil')]"))
    • Replaced findElements(By.tagName("img")) with findElements(By.xpath("//img[contains (@src, 'weil')]"))
  • Here is the execution result:

    • Code Block:

       public class A_Chrome_Demo { public static void main(String[] args) throws IOException { System.setProperty("webdriver.chrome.driver", "C:\\\\Utility\\\\BrowserDrivers\\\\chromedriver.exe"); ChromeOptions options = new ChromeOptions(); options.addArguments("start-maximized"); options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation")); options.setExperimentalOption("useAutomationExtension", false); WebDriver driver = new ChromeDriver(options); driver.get("https://www.weil.com/"); List<WebElement> linklist = driver.findElements(By.xpath("//a[contains (@href, 'weil')]")); linklist.addAll(driver.findElements(By.xpath("//img[contains (@src, 'weil')]"))); System.out.println("Size of full links and images--->"+ linklist.size()); List<WebElement> activeLinks =new ArrayList<WebElement>(); for(int i=0; i<linklist.size(); i++) { System.out.println(linklist.get(i).getAttribute("href")); if(linklist.get(i).getAttribute("href") !=null) activeLinks.add(linklist.get(i)); } System.out.println("Size of active links and images--->"+ activeLinks.size()); for(int j=0; j<activeLinks.size(); j++) { HttpURLConnection connection=(HttpURLConnection) new URL(activeLinks.get(j).getAttribute("href")).openConnection(); connection.connect(); String response=connection.getResponseMessage(); connection.disconnect(); System.out.println(activeLinks.get(j).getAttribute("href") +" --->"+response); } } }
    • Console Output:

       Size of full links and images--->46 https://www.weil.com/about-weil https://extranet.weil.com/ https://login.weil.com/ https://www.weil.com/articles/weil-elects-16-new-partners-and-announces-new-counsel-class-2019 https://www.weil.com/articles/weil-announces-weil-legal-innovators-program https://www.weil.com/articles/weil-partners-receive-top-honors-in-2019 https://www.weil.com/articles/two-weil-partners-named-among-turnarounds-workouts-outstanding-restructuring-lawyers-for-2019 https://careers.weil.com/ https://www.weil.com/articles/weil-wins-five-2019-law360-practice-group-of-the-year-awards https://www.weil.com/articles/weil-earns-2020-litigation-department-of-the-year-honorable-mention-from-the-american-lawyer https://www.weil.com/articles/weil-leads-three-of-the-five-top-bankruptcy-cases-of-2019 https://www.weil.com/about-weil/about-weil-prominent-matters https://www.weil.com/articles/weil-represented-french-state-in-landmark-privatization-and-ipo-of-francaise-des-jeux https://www.weil.com/articles/weil-litigators-clinch-four-win-week-showcasing-cross-departmental-strengths https://www.weil.com/articles/weil-advised-guggenheim-securities-and-morgan-stanley-on-jack-in-the-boxs-1-3b-securitization https://www.weil.com/about-weil/not-for-profit https://www.weil.com/articles/weil-secures-asylum-for-burkina-faso-native-escaping-persecution https://www.weil.com/articles/weils-2019-pro-bono-annual-review-our-finest-hours https://www.weil.com/articles/weil-and-nysba-task-force-deliver-report-on-wrongful-convictions-in-new-york-state https://www.weil.com/about-weil/diversity-and-inclusion https://www.weil.com/articles/weil-named-a-2020-best-place-to-work-for-lgbtq-equality https://www.weil.com/articles/three-weil-partners-named-best-practitioners-in-their-fields http://business-finance-restructuring.weil.com/ http://eurorestructuring.weil.com/ http://privateequity.weil.com/ http://governance.weil.com/ http://product-liability.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/ https://tax.weil.com/latest-thinking/cryptoassets-hmrc-uk-tax-net-widens/ http://business-finance-restructuring.weil.com/automatic-stay/denial-of-stay-relief-is-a-final-order-says-the-us-supreme-court/ http://business-finance-restructuring.weil.com/news/weil-wins-five-2019-law360-practice-group-of-the-year-awards/ https://www.weil.com/about-weil/green-policy https://www.weil.com/about-weil/sitemap https://www.weil.com/about-weil/privacy-policy https://www.weil.com/about-weil/privacy-shield-notice https://www.weil.com/about-weil/regulatory-information https://www.weil.com/about-weil/disclaimer null null null Size of active links and images--->43 https://www.weil.com/about-weil --->OK https://extranet.weil.com/ --->OK https://login.weil.com/ --->OK https://www.weil.com/articles/weil-elects-16-new-partners-and-announces-new-counsel-class-2019 --->OK https://www.weil.com/articles/weil-announces-weil-legal-innovators-program --->OK https://www.weil.com/articles/weil-partners-receive-top-honors-in-2019 --->OK https://www.weil.com/articles/two-weil-partners-named-among-turnarounds-workouts-outstanding-restructuring-lawyers-for-2019 --->OK https://careers.weil.com/ --->OK https://www.weil.com/articles/weil-wins-five-2019-law360-practice-group-of-the-year-awards --->OK https://www.weil.com/articles/weil-earns-2020-litigation-department-of-the-year-honorable-mention-from-the-american-lawyer --->OK https://www.weil.com/articles/weil-leads-three-of-the-five-top-bankruptcy-cases-of-2019 --->OK https://www.weil.com/about-weil/about-weil-prominent-matters --->OK https://www.weil.com/articles/weil-represented-french-state-in-landmark-privatization-and-ipo-of-francaise-des-jeux --->OK https://www.weil.com/articles/weil-litigators-clinch-four-win-week-showcasing-cross-departmental-strengths --->OK https://www.weil.com/articles/weil-advised-guggenheim-securities-and-morgan-stanley-on-jack-in-the-boxs-1-3b-securitization --->OK https://www.weil.com/about-weil/not-for-profit --->OK https://www.weil.com/articles/weil-secures-asylum-for-burkina-faso-native-escaping-persecution --->OK https://www.weil.com/articles/weils-2019-pro-bono-annual-review-our-finest-hours --->OK https://www.weil.com/articles/weil-and-nysba-task-force-deliver-report-on-wrongful-convictions-in-new-york-state --->OK https://www.weil.com/about-weil/diversity-and-inclusion --->OK https://www.weil.com/articles/weil-named-a-2020-best-place-to-work-for-lgbtq-equality --->OK https://www.weil.com/articles/three-weil-partners-named-best-practitioners-in-their-fields --->OK http://business-finance-restructuring.weil.com/ --->Forbidden http://eurorestructuring.weil.com/ --->Forbidden http://privateequity.weil.com/ --->Forbidden http://governance.weil.com/ --->Forbidden http://product-liability.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/ --->Forbidden https://tax.weil.com/latest-thinking/cryptoassets-hmrc-uk-tax-net-widens/ --->Forbidden http://business-finance-restructuring.weil.com/automatic-stay/denial-of-stay-relief-is-a-final-order-says-the-us-supreme-court/ --->Forbidden http://business-finance-restructuring.weil.com/news/weil-wins-five-2019-law360-practice-group-of-the-year-awards/ --->Forbidden https://www.weil.com/about-weil/green-policy --->OK https://www.weil.com/about-weil/sitemap --->OK https://www.weil.com/about-weil/privacy-policy --->OK https://www.weil.com/about-weil/privacy-shield-notice --->OK https://www.weil.com/about-weil/regulatory-information --->OK https://www.weil.com/about-weil/disclaimer --->OK

Reference

You can find a relevant detailed discussion in:

This is because the Web Page contains 'a' Tag Elements with no reference to the href keyword.

ie Top Left-most List-Drawer Icon and Search Icon.

Refer the attached image.

在此处输入图片说明

Using a try catch block for java.net.MalformedURLException could possibly help you out and would allow you to move ahead with the desired flow.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM