简体   繁体   中英

Check html syntax in string

I can not find a solution.

How to check string with the html code.

example

<p><o:p></o:p></p> 
<p> <br /> </p> 
<p><b style=\"font-weight: bold;\"><b>Desc: </b>AnyText.</p> 
 <br /> </p> 
<p><b>Color:</b> green<
<p> <b>Param 2: AU55688</p> 
<p><b>Param 3: </b>420 x 562</p> 
<p><b>Height: </b>1425</p>

If there are unclosed tags or undiscovered, then return string if all is well, then skip.

I found and modified function. But it does not work properly

function closetag($html)
{
    $ignore_tags = array('img', 'br', 'hr');

    preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", mb_strtolower($html), $result1);
    preg_match_all ( "#</([a-z]+)>#iU", mb_strtolower($html), $result2);
    $results_start = $result1[1];
    $results_end = $result2[1];

    $result = array();
    foreach($results_start AS $startag)
    {
        if (!in_array($startag, $results_end) && !in_array($startag, $ignore_tags))
        {
            $result['start_tags'][] = $startag;
        }
    }
    foreach($results_end AS $endtag)
    {
        if (!in_array($endtag, $results_start) && !in_array($endtag, $ignore_tags))
        {
            $result['end_tags'][] = $endtag;
        }
    }

    return ($result) ? $result : false;
}

I do not need to correct the code, I need only determine that the syntax is not correct.

An example of what I want to get a result

$getTexts = $this->getTexts();

$no_valid = array();
foreach($getTexts AS $text)
{
    $_valid = check_html_systax_function($text);
    if (!$_valid)
    {
        $no_valid[] = $text;
    }
}

check_html_systax_function checks texts for correct html syntax

$no_valid array of texts in which errors in html syntax

PS Sorry for my English!

Do not use Regex to parse or validate HTML.

For PHP, there is the class DOMDocument . You can use this as follows:

$dom = new DOMDocument;
$dom->loadHTML($html);
if ($dom->validate()) {
    //valid HTML code
}

If you're looking for a library that offers more configurability and detailed error reporting, check HTMLpurifier .

You can check the following links for PHP HTML DOM parsers:

You can check html is valid or not by following code :

function closetags($html) {
    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];
    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];
    $len_opened = count($openedtags);
    if (count($closedtags) == $len_opened) {
        echo 'valid html'; 
    } else {
        echo 'invalid html';
    }
} 

$html = '<p>This is some text and here is a <strong>bold text then the post stop here....</p>';
closetags($html);

I've created method based on regex by Charvi .

It is available in text utilities: https://github.com/Alex-KOR/Text-utilities

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM