PHP/regex：如何获取 HTML 标签的字符串值？

Question

I need help on regex or preg_match because I am not that experienced yet with regards to those so here is my problem.我需要关于 regex 或preg_match帮助，因为我在这些方面还没有那么有经验，所以这是我的问题。

I need to get the value "get me" but I think my function has an error.我需要获取值“get me”，但我认为我的函数有错误。 The number of html tags are dynamic. html 标签的数量是动态的。 It can contain many nested html tag like a bold tag.它可以包含许多嵌套的 html 标签，如粗体标签。 Also, the "get me" value is dynamic.此外，“get me”值是动态的。

<?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname>(.*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

Answer 1

<?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

That should do the trick这应该够了吧

Answer 2

Try this尝试这个

$str = '<option value="123">abc</option>
        <option value="123">aabbcc</option>';

preg_match_all("#<option.*?>([^<]+)</option>#", $str, $foo);

print_r($foo[1]);

Answer 3

In your pattern, you simply want to match all text between the two tags.在您的模式中，您只想匹配两个标签之间的所有文本。 Thus, you could use for example a [\\w\\W] to match all characters.因此，您可以使用例如[\\w\\W]来匹配所有字符。

function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname>([\w\W]*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

Answer 4

Since attribute values may contain a plain > character, try this regular expression:由于属性值可能包含一个普通的>字符，请尝试以下正则表达式：

$pattern = '/<'.preg_quote($tagname, '/').'(?:[^"'>]*|"[^"]*"|\'[^\']*\')*>(.*?)<\/'.preg_quote($tagname, '/').'>/s';

But regular expressions are not suitable for parsing non-regular languages like HTML.但是正则表达式不适合解析像 HTML 这样的非正则语言。 You should better use a parser like SimpleXML or DOMDocument .你最好使用像SimpleXML或DOMDocument这样的解析器。

Answer 5

this might be old but my answer might help someone这可能很旧，但我的回答可能会帮助某人

You can simply use你可以简单地使用

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
echo strip_tags($str);

https://www.php.net/manual/en/function.strip-tags.php https://www.php.net/manual/en/function.strip-tags.php

Answer 6

The following php snippets would return the text between html tags/elements.以下 php 片段将返回 html 标签/元素之间的文本。

regex : "/tagname(.*)endtag/" will return text between tags. regex : "/tagname(.*)endtag/" 将返回标签之间的文本。

ie IE

$regex="/[start_tag_name](.*)[/end_tag_name]/";
$content="[start_tag_name]SOME TEXT[/end_tag_name]";
preg_replace($regex,$content);

It will return "SOME TEXT".它将返回“一些文本”。

Answer 7

$userinput = "http://www.example.vn/";
//$url = urlencode($userinput);
$input = @file_get_contents($userinput) or die("Could not access file: $userinput");
$regexp = "<tagname\s[^>]*>(.*)<\/tagname>";
//==Example:
//$regexp = "<div\s[^>]*>(.*)<\/div>";

if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
    foreach($matches as $match) {
        // $match[2] = link address 
        // $match[3] = link text
    }
}

Answer 8

尝试$pattern = "<($tagname)\\b.*?>(.*?)</\\1>"并return $matches[2]

Answer 9

Your HTML你的 HTML

$html='<ul id="main">
    <li>
        <h1><a href="[link]">My Title</a></h1>
        <span class="date">Date</span>
        <div class="section">
            [content]
        </div>
    </li>
</ul>';

//function call you can change the tag name //函数调用你可以改变标签名称

echo contentBetweenTags($html,"span");

// this function will help you to fetch the data from a specific tag // 此函数将帮助您从特定标签中获取数据

function contentBetweenTags($content, $tagname){
    $pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
    preg_match($pattern, $content, $matches);
    
    if(empty($matches))
        return;
    
    $str = "<$tagname>".html_entity_decode($matches[1])."</$tagname>";
    return $str;
}

PHP/regex：如何获取 HTML 标签的字符串值？

问题描述

9 个解决方案

解决方案1
69 已采纳 2009-05-06 09:58:42

解决方案2
15 2012-01-21 21:39:02

解决方案3
8 2009-05-06 09:58:44

解决方案4
2 2009-09-22 05:36:17

解决方案5
1 2020-06-08 00:33:01

解决方案6
0

解决方案7
0 2013-01-11 02:47:16

解决方案8
0 2015-08-06 21:22:07

解决方案9
0 2021-01-28 10:28:06

PHP/regex：如何获取 HTML 标签的字符串值？

问题描述

9 个解决方案

解决方案1 69 已采纳 2009-05-06 09:58:42

解决方案2 15 2012-01-21 21:39:02

解决方案3 8 2009-05-06 09:58:44

解决方案4 2 2009-09-22 05:36:17

解决方案5 1 2020-06-08 00:33:01

解决方案6 0

解决方案7 0 2013-01-11 02:47:16

解决方案8 0 2015-08-06 21:22:07

解决方案9 0 2021-01-28 10:28:06

解决方案1
69 已采纳 2009-05-06 09:58:42

解决方案2
15 2012-01-21 21:39:02

解决方案3
8 2009-05-06 09:58:44

解决方案4
2 2009-09-22 05:36:17

解决方案5
1 2020-06-08 00:33:01

解决方案6
0

解决方案7
0 2013-01-11 02:47:16

解决方案8
0 2015-08-06 21:22:07

解决方案9
0 2021-01-28 10:28:06