简体   繁体   中英

how to extract form tag using htmlagility pack?

I'm using HtmlAgilityPack in one of my C# Projects for scraping. I need to scrap the <form> tag from web page. I've searched about how to extract form tag using HtmlAgilityPack but couldn't find an answer. Can anyone tell me how to extract <form> tag using HtmlAgilityPack ?

private void Testing()
        {
            var getHtmlWeb = new HtmlWeb();
            var document = getHtmlWeb.Load(@"http://localhost/final_project/index.php");
            HtmlNode.ElementsFlags.Remove("form");
            var aTags = document.DocumentNode.SelectNodes("//form");
            int counter = 1;
            StringBuilder buffer = new StringBuilder();
            if (aTags != null)
            {
                foreach (var aTag in aTags)
                {
                    buffer.Append(counter + ". " + aTag.InnerHtml + " - " + "\t" + "<br />");
                    counter++;
                }
            }
        }

Here is my code sample. I'm scraping a page from my localhost . count of aTags is 1 because there is only one form on page. But when I use but my StringBuilder object doesn't contain any InnerHtml of form. Where's is the error :(

Here is my html source from which I want to scrap form

<!DOCTYPE html>
<html>
    <head>
    <!-- stylesheet section -->
    <link rel="stylesheet" type="text/css" media="all" href="./_include/style.css">

    <!-- title of the page -->
    <title>Login</title>

    <!-- PHP Section -->
    <!-- Creating a connection with database-->
     <!-- end of PHP Sectoin -->

    </head>
        <body>
            <!-- now we'll check error variable to print warning -->
                        <!-- we'll submit the data to the same page to avoid excessive pages -->
            <form action="/final_project/index.php" method="post">
                <!-- ============================== Fieldset 1 ============================== -->
                <fieldset>
                    <legend>Log in credentials:</legend>
                    <hr class="hrzntlrow" />
                        <label for="input-one"><strong>User Name:</strong></label><br />
                        <input autofocus name="userName" type="text" size="20" id="input-one" class="text" placeholder="User Name" required /><br />

                        <label for="input-two"><strong>Password:</strong></label><br />
                        <input name="password" type="password" size="20" id="input-two" class="text" placeholder="Password" required />
                </fieldset>
                <!-- ============================== Fieldset 1 end ============================== -->

                <p><input type="submit" alt="SUBMIT" name="submit" value="SUBMIT" class="submit-text" /></p>
            </form>
        </body>
</html>

Since form tags are allowed to overlap , HAP handles them differently, to treat form tags as any other element just remove the form flag by calling:

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");

Now your form tags will be handled as you expect, and you can work with the way you work with other tags.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM