简体   繁体   English

如何在PHP中转换HTML-ENTITIES和preg_replace

[英]How to convert HTML-ENTITIES and preg_replace in PHP

I'm trying to convert   我正在尝试转换  to whitespace . whitespace

and then use preg_replace to do some Regex. 然后使用preg_replace做一些正则表达式。

like this. 像这样。

$title = " TEST Ok.2-2";
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.

//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = "~\s+(ok[.]?)~i";
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/\s+/', ' ', $title);
$title = trim($title);

//The result = TEST < Ok.2-2 (not work!)
echo($title);

with this code the mb_convert_encoding and html_entity_decode is work well but when I try to use preg_replace to regex the whitespace it seem it not found the whitespace that converted. 与此代码的mb_convert_encodinghtml_entity_decode是很好的工作,但是当我尝试使用preg_replace到正则表达式中的空白似乎它没有找到转换的空白。

Now out put: TEST < Ok.2-2 现在输出: TEST < Ok.2-2

Expected out put: TEST < OK. 2-2 预期输出: TEST < OK. 2-2 TEST < OK. 2-2

NOW MY SOLUTION 现在我的解决方案

I added the str_replace to hardcode replace a &nbsp; 我将str_replace添加到硬编码中,以替换&nbsp; to whitespace and use mb_convert_encoding or html_entity_decode to convert another htmlentity. 空格,并使用mb_convert_encoding或html_entity_decode转换另一个htmlentity。

$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = str_replace('&nbsp;', ' ', $title);
$title = mb_convert_encoding($title, 'UTF-8', 'HTML-ENTITIES');
//$title = html_entity_decode($title, ENT_NOQUOTES, 'UTF-8');
//( MEAN: I can use mb_convert_encoding() or html_entity_decode())
//GOT the same out put = TEST < Ok.2-2.

//So now I have TEST < Ok.2-2
//I want to make a space on Ok so I use preg_replace()
$replace = '~\s+(ok[.]?)~i';
$title = preg_replace($replace, ' OK. ', $title, -1);
$title = preg_replace('/\s+/', ' ', $title);
$title = trim($title);

//The result TEST < OK. 2-2 (WORK!)
echo($title);

NOW my out put: TEST < OK. 2-2 现在我的输出: TEST < OK. 2-2 TEST < OK. 2-2

MY expected: TEST < OK. 2-2 我的期望: TEST < OK. 2-2 TEST < OK. 2-2

Any suggestion for best solution? 对最佳解决方案有什么建议吗?

I think this will give you what you are after. 我认为这将为您提供所需的服务。

$title = trim(
     preg_replace('~\s+~', ' ', 
          str_ireplace(array('&nbsp;', ' ok.'), array(' ', ' OK. '), 
     "&nbsp;TEST&nbsp;Ok.2-2")
     )
);

This will: 这将:

  1. Strip leading and trailing white spaces ( trim ) 去除前后空格( trim
  2. Replace multiple white spaces with a single white space ( preg_replace('~\\s+~', ' ' ) 用单个空格替换多个空格( preg_replace('~\\s+~', ' '
  3. Replace &nbsp; 替换&nbsp; to a single space ( str_ireplace ) 到一个空格( str_ireplace
  4. Replace ok. 替换ok. case insensitive to OK. 不区分大小写OK. ( str_ireplace ) str_ireplace

Output: 输出:

TEST OK. 测试OK。 2-2 2-2

Your HTML entity decode example is correct, http://sandbox.onlinephpfunctions.com/code/eed7e30d507f7197585f29c1fdde9e7744fc572d 您的HTML实体解码示例是正确的, http://sandbox.onlinephpfunctions.com/code/eed7e30d507f7197585f29c1fdde9e7744fc572d

$title = html_entity_decode("&nbsp;TEST&nbsp;Ok.2-2", ENT_NOQUOTES, 'UTF-8');
echo $title;

Output: 输出:

TEST Ok.2-2 测试2-2

Edit: 编辑:

<?php
$title = '&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2';
$title = trim(preg_replace('~\s+~', ' ', str_ireplace(array('&nbsp;', '&lt;', 'Ok.'), array(' ', '', ' OK. '), $title)));
echo $title;

It's probably safer to just remove the 2 entities with the str_replace. 仅使用str_replace删除2个实体可能更安全。 If your string were <h1>&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2</h1> and you decoded then removed all < your string would not function as it had. 如果您的字符串是<h1>&nbsp;TEST&nbsp;&lt;&nbsp;Ok.2-2</h1>并且您进行了解码,然后删除了所有<您的字符串将无法正常运行。

Output: 输出:

TEST OK. 测试OK。 2-2 2-2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM