简体   繁体   中英

How to split the results of a php xpath query

I am trying to build a page that when a website URL is added to the input my php will scrape that page and display the input names for each form on that page.

I have successfully accomplished this, however im trying to split the results to make them easier to read if there are multiple forms on a page.

<form action="" method="post">
                <label style="color:#000000; font-family:arial, helvetica, sans-serif; font-size:16px; display:block;">Website URL:</label><br>
                <input type="text" name="website-url-value" id="website-url-value" style="border:1px solid #000;" />
                <div style="display:block; clear:both; margin-bottom:20px;"></div>
                <input type="submit" name="submit" value="Find forms" />
            </form>

        <?php
            $html = file_get_contents($_POST['website-url-value']);
            $website_doc = new DOMDocument();
            libxml_use_internal_errors(TRUE); //disable libxml errors
            if(!empty($html)){ //if any html is actually returned
                $website_doc->loadHTML($html);
                libxml_clear_errors(); //remove errors for bad html

                $website_xpath = new DOMXPath($website_doc);
                $form_total = 1; // initial form counter
                //get all the form fields
                $full_forms = $website_xpath->query('
                    //form
                '); // find forms on page
                $full_inputs = $website_xpath->query('
                    //input[@type="text"]|
                    //input[@type="radio"]|
                    //input[@type="checkbox"]|
                    //input[@type="tel"]|
                    //input[@type="email"]|
                    //input[@type="date"]|
                    //input[@type="number"]|
                    //input[@type="time"]|
                    //textarea|
                    //select'
                ); // find form fields with these types
                if($full_inputs->length > 0){
                    foreach($full_inputs as $single_input){
                        echo $single_input->getAttribute('name') . '<br />'; // show each field followed by new line
                    }
                }
                if($full_forms->length > 0){
                    foreach($full_forms as $single_form){
                        echo '<strong>' . $single_form->nodeName . " " . $form_total++ . '</strong><br />'; // show form plus count
                    }
                }
            }
        ?>

I expect the result to look like: Form 1: FirstName LastName Email

Form 2: FirstName LastName Phone

But currently the results i am getting is as below:

FirstName LastName Email FirstName LastName Phone Form 1: Form 2:

What you are doing is getting all input from the html documents, what you need to do is get 1 form at a time and get their relevant inputs.

One more thing is xpath returns nodelist as a result, but we can use the nodelist and convert it into xpath again to query further. For that you can use descendant parameter and pass nodelist as 2nd parameter.

Try this :

if(!empty($html)) {
    $website_doc = new DOMDocument();

    libxml_use_internal_errors(TRUE); //disable libxml errors

    $website_doc->loadHTML($html);

    libxml_clear_errors(); //remove errors for bad html

    $xpath = new DOMXPath($website_doc);

    $forms = $xpath->query("//form");

    foreach($forms as $key => $form) {
        $inputs = $xpath->query('descendant::
            input[@type="text"]|
            input[@type="radio"]|
            input[@type="checkbox"]|
            input[@type="tel"]|
            input[@type="email"]|
            input[@type="date"]|
            input[@type="number"]|
            input[@type="time"]|
            textarea|
            select', $form);

        echo "Form ".($key+1)." <br>";

        foreach ($inputs as $input) {
            echo $input->getAttribute('name') . '<br />';
        }

        echo "<br>";
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM