I am trying to build a page that when a website URL is added to the input my php will scrape that page and display the input names for each form on that page.
I have successfully accomplished this, however im trying to split the results to make them easier to read if there are multiple forms on a page.
<form action="" method="post">
<label style="color:#000000; font-family:arial, helvetica, sans-serif; font-size:16px; display:block;">Website URL:</label><br>
<input type="text" name="website-url-value" id="website-url-value" style="border:1px solid #000;" />
<div style="display:block; clear:both; margin-bottom:20px;"></div>
<input type="submit" name="submit" value="Find forms" />
</form>
<?php
$html = file_get_contents($_POST['website-url-value']);
$website_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$website_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for bad html
$website_xpath = new DOMXPath($website_doc);
$form_total = 1; // initial form counter
//get all the form fields
$full_forms = $website_xpath->query('
//form
'); // find forms on page
$full_inputs = $website_xpath->query('
//input[@type="text"]|
//input[@type="radio"]|
//input[@type="checkbox"]|
//input[@type="tel"]|
//input[@type="email"]|
//input[@type="date"]|
//input[@type="number"]|
//input[@type="time"]|
//textarea|
//select'
); // find form fields with these types
if($full_inputs->length > 0){
foreach($full_inputs as $single_input){
echo $single_input->getAttribute('name') . '<br />'; // show each field followed by new line
}
}
if($full_forms->length > 0){
foreach($full_forms as $single_form){
echo '<strong>' . $single_form->nodeName . " " . $form_total++ . '</strong><br />'; // show form plus count
}
}
}
?>
I expect the result to look like: Form 1: FirstName LastName Email
Form 2: FirstName LastName Phone
But currently the results i am getting is as below:
FirstName LastName Email FirstName LastName Phone Form 1: Form 2:
What you are doing is getting all input from the html documents, what you need to do is get 1 form at a time and get their relevant inputs.
One more thing is xpath returns nodelist as a result, but we can use the nodelist and convert it into xpath again to query further. For that you can use descendant
parameter and pass nodelist as 2nd parameter.
Try this :
if(!empty($html)) {
$website_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
$website_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for bad html
$xpath = new DOMXPath($website_doc);
$forms = $xpath->query("//form");
foreach($forms as $key => $form) {
$inputs = $xpath->query('descendant::
input[@type="text"]|
input[@type="radio"]|
input[@type="checkbox"]|
input[@type="tel"]|
input[@type="email"]|
input[@type="date"]|
input[@type="number"]|
input[@type="time"]|
textarea|
select', $form);
echo "Form ".($key+1)." <br>";
foreach ($inputs as $input) {
echo $input->getAttribute('name') . '<br />';
}
echo "<br>";
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.