从字符串中获取单词-跳过HTML

Question

我使用一个函数来获取字符串的第一个“ x”字。 主要部分是：

preg_match_all('/(<\/?([\w+]+)[^>]*>)?([^<>]*)/', $text, $tags, PREG_SET_ORDER);

当单词在html内时-示例：

<a href="/"><u>Linktext</u></a>

正则表达式将单词“ linktext”视为一个单词。 正则表达式应更改为跳过 html标记内的每个单词。

这可能吗？

Answer 1

使用XSL转换。 我从相关答案（如何从XML文档中删除所有文本）中使用了模板：

$string = '<a href="/">Some text <u>Linktext</u> more text</a>';
$xslTemplate = '<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">
  <!-- copy all nodes -->
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <!-- clear attributes -->
  <xsl:template match="@*">
    <xsl:attribute name="{name()}" />
  </xsl:template>
  <!-- ignore text content of nodex -->
  <xsl:template match="text()" />
</xsl:stylesheet>';

libxml_use_internal_errors(true);

$inputDom = new DOMDocument();
$inputDom->loadHTML($string);

$xslDom = new DOMDocument();
$xslDom->loadXML($xslTemplate);

$cp = new XSLTProcessor();
$cp->registerPHPFunctions();
$cp->importStylesheet($xslDom);

$transformedResult = $cp->transformToDoc($inputDom);
$transformedHtmlString = $transformedResult->saveXML($transformedResult->getElementsByTagName('body')->item(0));

$transformedHtmlString = str_replace('<body>','', $transformedHtmlString); //saveXml() method leaves automatically created body tag
$transformedHtmlString = str_replace('</body>','', $transformedHtmlString);
echo $transformedHtmlString;

从字符串中获取单词-跳过HTML

问题描述

1 个解决方案

解决方案1
0 2016-02-06 10:56:09

从字符串中获取单词-跳过HTML

问题描述

1 个解决方案

解决方案1 0 2016-02-06 10:56:09

解决方案1
0 2016-02-06 10:56:09