简体   繁体   English

用bb代码替换html标签

[英]replace html tags with bb code

How can I replace certain HTML tags with BBcode like tags? 如何用BBcode之类的标签替换某些HTML标签?

For example replace <a ...> ... </a> with [url ...] ... [/url] or <code ...> ... </code> with [code ...] ... [/code] from a $var string 例如,将<a ...> ... </a>替换为[url ...] ... [/url]<code ...> ... </code>替换为[code ...] ... [/code]来自$ var字符串

您可以编写定制的XSLT来转换格式,并通过XSLT处理器运行该格式以获得所需的输出。

To convert old articles that were using HTML tags inside, I have created this, pretty complicated, script. 为了转换内部使用HTML标记的旧文章,我创建了这个非常复杂的脚本。 The $body variable contains the article text. $ body变量包含文章文本。 This procedure is able to replace pre and code tags with a special marker. 此过程能够用特殊标记替换pre和code标签。 When all the other tags are converted, the script will replace the previous marker with text. 转换完所有其他标记后,脚本将用文本替换之前的标记。 This procedure works with both html or bbcode text. 此过程适用于html或bbcode文本。

  // Let's find all code inside the body. The code can be inside <pre></pre>, <code></code>, or [code][/code] if you
  // are using BBCode markup language.
  $pattern = '%(?P<openpre><pre>)(?P<contentpre>[\W\D\w\s]*?)(?P<closepre></pre>)|(?P<opencode><code>)(?P<contentcode>[\W\D\w\s]*?)(?P<closecode></code>)|(?P<openbbcode>\[code=?\w*\])(?P<contentbbcode>[\W\D\w\s]*?)(?P<closebbcode>\[/code\])%i';

  if (preg_match_all($pattern, $body, $snippets)) {

    $pattern = '%<pre>[\W\D\w\s]*?</pre>|<code>[\W\D\w\s]*?</code>|\[code=?\w*\][\W\D\w\s]*?\[/code\]%i';

    // Replaces the code snippet with a special marker to be able to inject the code in place.
    $body = preg_replace($pattern, '___SNIPPET___', $body);
  }


  // Replace links.
  $body = preg_replace_callback('%(?i)<a[^>]+>(.+?)</a>%',

    function ($matches) use ($item) {

      // Extracts the url.
      if (preg_match('/\s*(?i)href\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1) {
        $href = strtolower(trim($others[1], '"'));

        // Extracts the target.
        if (preg_match('/\s*(?i)target\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
          $target = strtolower(trim($others[1], '"'));
        else
          $target = "_self";
      }
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed links", $item->idItem));

      return "[url=".$href." t=".$target."]".$matches[1]."[/url]";

    },

    $body
  );


  // Replace images.
  $body = preg_replace_callback('/<img[^>]+>/i',

    function ($matches) use ($item) {

      // Extracts the src.
      if (preg_match('/\s*(?i)src\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
        $src = strtolower(trim($others[1], '"'));
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed images", $item->idItem));

      return "[img]".$src."[/img]";

    },

    $body
  );


  // Replace other tags.
  $body = preg_replace_callback('%</?[a-z][a-z0-9]*[^<>]*>%i',

    function ($matches) {
      $tag = strtolower($matches[0]);

      switch ($tag) {
        case ($tag == '<strong>' || $tag == '<b>'):
          return '[b]';
          break;

        case ($tag == '</strong>' || $tag == '</b>'):
          return '[/b]';
          break;

        case ($tag == '<em>' || $tag == '<i>'):
          return '[i]';
          break;

        case ($tag == '</em>' || $tag == '</i>'):
          return '[/i]';
          break;

        case '<u>':
          return '[u]';
          break;

        case '</u>':
          return '[/u]';
          break;

        case ($tag == '<strike>' || $tag == '<del>'):
          return '[s]';
          break;

        case ($tag == '</strike>' || $tag == '</del>'):
          return '[/s]';
          break;

        case '<ul>':
          return '[list]';
          break;

        case '</ul>':
          return '[/list]';
          break;

        case '<ol>':
          return '[list=1]';
          break;

        case '</ol>':
          return '[/list]';
          break;

        case '<li>':
          return '[*]';
          break;

        case '</li>':
          return '';
          break;

        case '<center>':
          return '[center]';
          break;

        case '</center>':
          return '[/center]';
          break;

        default:
          return $tag;
      }
    },

    $body
  );


  // Now we strip the remaining HTML tags.
  $body = strip_tags($body);


  // Finally we can restore the snippets, converting the HTML tags to BBCode tags.
  $snippetsCount = count($snippets[0]);

  for ($i = 0; $i < $snippetsCount; $i++) {
    // We try to determine which tags the code is inside: <pre></pre>, <code></code>, [code][/code]
    if (!empty($snippets['openpre'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentpre'][$i]).PHP_EOL."[/code]";
    elseif (!empty($snippets['opencode'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentcode'][$i]).PHP_EOL."[/code]";
    else
      $snippet = $snippets['openbbcode'][$i].PHP_EOL.trim($snippets['contentbbcode'][$i]).PHP_EOL.$snippets['closebbcode'][$i];

    $body = preg_replace('/___SNIPPET___/', PHP_EOL.trim($snippet).PHP_EOL, $body, 1);
  }

  //echo $body;

Reverse HTML to BBCODE conversions are not difficult. 将HTML反向转换为BBCODE并不困难。 Libraries exist for that, and I'm certain we have a duplicate answer. 为此存在图书馆,我敢肯定我们有一个重复的答案。 But I'm bad at searching too. 但是我也很难搜索。

Basically you can use preg_replace like this: 基本上,您可以像这样使用preg_replace

 // for 1:1 translations
 $text = preg_replace('#<(/?)(b|i|code|pre)>#', '[$1$2]', $text);

 // complex tags
 $text = preg_replace('#<a href="([^"]+)">([^<]+)</a>#',
             "[url=$1]$2[/url]", $text);

But the second case will fail if your input HTML doesn't very exactly match the expectations. 但是,如果您输入的HTML与期望值不完全匹配,则第二种情况将失败。 If you try to convert exported Word files, such a simplistic approach will fail. 如果您尝试转换导出的Word文件,则这种简单的方法将失败。 Also you need more special cases for [img] and stuff. 另外,您还需要[img]和其他内容的特殊情况。

Not a trivial task. 这不是一件微不足道的任务。 I looked into this a while back and the best code I came across was this one: cbparser 我回想了一下,碰到的最好的代码是: cbparser

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM