简体   繁体   中英

Detect text between some tags

I am trying to detect text between 3 or 4 tags and I have no idea how - USING PHP. I know that I am supposed to use regex but thats too hard for my mind :X

If you can explain me how to do it / give me example of what I need it will be great!

I am trying to detect code between <script> tag > which mean if I got <script type="text/javascript"> it will detect also. if there's <script src="..."> then it wont detect the text between (shouldnt be text between).

same with script ^ if there's <style type="text/css"> it will detect the text between too

and I also want to detect text between style="detect text here" artitube.

Last tag I want to text between is <?php ?> . (php can be also in upper case, so I dont want the regex to be case sensitive).

Thanks for the helpers!!!

Using regular expressions you could write something like:

<?php
$html = <<<EOF
<script type="text/javascript">
    function xyz() { alert('some alert'); }
</script>
EOF;

preg_match('/<script.*>(.*)<\/script>/sU', $html, $matches);

var_dump($matches)
?>

Regular expressions aren't best suited for parsing HTML. For good reasons why, see the question Can you provide some examples of why it is hard to parse XML and HTML with a regex?

You'll have an easier time loading the HTML into the DOM XML classes, then you can perform XPath queries to extract the tags you want.

For example, try something like this to get all the <script> tags which don't have a src attribute...

$doc = new DOMDocument();
$doc->loadHTMLFile("myfile.html");

$xpath=new DOMXPath($doc);

//find script elements which don't have a src attribute
$scriptNodes=$xpath->query("script[not(@src)]");
foreach ($scriptNodes as $scriptNode) {

    //do something here...

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM