如何使用PHP检索HTML标记内的纯文本？

Question

I have a form which is accepts HTML data, but we need only their respective text, not anything else. 我有一个接受HTML数据的表单，但是我们只需要它们各自的文本，而无需其他任何文本。 Is there any particular way to extract the text out of the HTML in PHP? 有没有什么特殊的方法可以从PHP的HTML中提取文本？

Answer 1

使用strip_tags（）。

Answer 2

Surely it can be done: 当然可以做到：

Just look at this function and use it as you like: 只需查看此功能并根据需要使用即可：

function html2txt ($document)
{
    $search = array (
            "'<script[^>]*?>.*?</script>'si", // Strip out JavaScript code
            "'<[\/\!]*?[^<>]*?>'si",          // Strip out HTML tags
            "'([\r\n])[\s]+'",                // Strip out white space
            "'@<![\s\S]*?â��[ \t\n\r]*>@'",   
            "'&(quot|#34|#034|#x22);'i",      // Replace HTML entities
            "'&(amp|#38|#038|#x26);'i",       // Added hexadecimal values
            "'&(lt|#60|#060|#x3c);'i",
            "'&(gt|#62|#062|#x3e);'i",
            "'&(nbsp|#160|#xa0);'i",
            "'&(iexcl|#161);'i",
            "'&(cent|#162);'i",
            "'&(pound|#163);'i",
            "'&(copy|#169);'i",
            "'&(reg|#174);'i",
            "'&(deg|#176);'i",
            "'&(#39|#039|#x27);'",
            "'&(euro|#8364);'i",         // Europe
            "'&a(uml|UML);'",            // German
            "'&o(uml|UML);'",
            "'&u(uml|UML);'",
            "'&A(uml|UML);'",
            "'&O(uml|UML);'",
            "'&U(uml|UML);'",
            "'&szlig;'i",
            );
    $replace = array (    "",
                "",
                " ",
                "\"",
                "&",
                "<",
                ">",
                " ",
                chr(161),
                chr(162),
                chr(163),
                chr(169),
                chr(174),
                chr(176),
                chr(39),
                chr(128),
                "Ã¤",
                "Ã¶",
                "Ã¼",
                "Ã�",
                "Ã�",
                "Ã�",
                "Ã�",
            );

    $text = preg_replace($search, $replace, $document);

    return trim ($text);
}

Answer 3

You can parse the HTML using DOMDocument::loadHTMLFile and extract what you need. 您可以使用DOMDocument::loadHTMLFile解析HTML并提取所需的内容。

$doc = new DOMDocument();
$doc->loadHTMLFile("data.html");
$metaTags = $doc->getElementsByTagName('meta');
// Process $metaTags

如何使用PHP检索HTML标记内的纯文本？

问题描述

3 个解决方案

解决方案1
5 已采纳 2010-01-23 18:21:28

解决方案2
2 2010-01-23 20:22:17

解决方案3
1 2010-01-23 18:36:23

如何使用PHP检索HTML标记内的纯文本？

问题描述

3 个解决方案

解决方案1 5 已采纳 2010-01-23 18:21:28

解决方案2 2 2010-01-23 20:22:17

解决方案3 1 2010-01-23 18:36:23

解决方案1
5 已采纳 2010-01-23 18:21:28

解决方案2
2 2010-01-23 20:22:17

解决方案3
1 2010-01-23 18:36:23