简体   繁体   English

用 PHP 替换多余的空格和换行符?

[英]Replace excess whitespaces and line-breaks with PHP?

$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";

echo preg_replace("/\s\s+/", " ", $string);

I read the PHP's documentation and follow the preg_replace's tutorial, however this code produce我阅读了 PHP 的文档并遵循了 preg_replace 的教程,但是这段代码会产生

My text has so much whitespace Plenty of spaces and tabs我的文本有很多空格 大量的空格和制表符

How can I turn it into :我怎样才能把它变成:

My text has so much whitespace我的文字有很多空白
Plenty of spaces and tabs大量的空格和制表符

First, I'd like to point out that new lines can be either \\r, \\n, or \\r\\n depending on the operating system.首先,我想指出新行可以是 \\r、\\n 或 \\r\\n,具体取决于操作系统。

My solution:我的解决方案:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/[\r\n]+/', "\n", $string));

Which could be separated into 2 lines if necessary:如有必要,可以分为两行:

$string = preg_replace('/[\r\n]+/', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

Update :更新

An even better solutions would be this one:更好的解决方案是这个:

echo preg_replace('/[ \t]+/', ' ', preg_replace('/\s*$^\s*/m', "\n", $string));

Or:或者:

$string = preg_replace('/\s*$^\s*/m', "\n", $string);
echo preg_replace('/[ \t]+/', ' ', $string);

I've changed the regular expression that makes multiple lines breaks into a single better.我改变了正则表达式,使多行更好地分成一行。 It uses the "m" modifier (which makes ^ and $ match the start and end of new lines) and removes any \\s (space, tab, new line, line break) characters that are a the end of a string and the beginning of the next.它使用“m”修饰符(使 ^ 和 $ 匹配新行的开头和结尾)并删除任何 \\s(空格、制表符、换行符、换行符)作为字符串结尾和开头的字符下一个。 This solve the problem of empty lines that have nothing but spaces.这解决了只有空格的空行的问题。 With my previous example, if a line was filled with spaces, it would have skipped an extra line.在我前面的例子中,如果一行被空格填充,它会跳过一个额外的行。

Edited the right answer.编辑了正确答案。 From PHP 5.2.4 or so, the following code will do:从 PHP 5.2.4 左右开始,以下代码将执行:

echo preg_replace('/\v(?:[\v\h]+)/', '', $string);
//Newline and tab space to single space

$from_mysql = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $from_mysql);


// Multiple spaces to single space ( using regular expression)

$from_mysql = ereg_replace(" {2,}", ' ',$from_mysql);

// Replaces 2 or more spaces with a single space, {2,} indicates that you are looking for 2 or more than 2 spaces in a string.

Replace Multiple Newline, Tab, Space 替换多个换行符、制表符、空格

$text = preg_replace("/[\r\n]+/", "\n", $text);
$text = preg_replace("/\s+/", ' ', $text);

Tested :)测试:)

this would COMPLETELY MINIFY the entire string (such as a large blog article) yet preserving all HTML tags in place.这将完全缩小整个字符串(例如大型博客文章),同时保留所有 HTML 标签。

$email_body = str_replace(PHP_EOL, ' ', $email_body);
    //PHP_EOL = PHP_End_Of_Line - would remove new lines too
$email_body = preg_replace('/[\r\n]+/', "\n", $email_body);
$email_body = preg_replace('/[ \t]+/', ' ', $email_body);

Alternative approach:替代方法:

echo preg_replace_callback("/\s+/", function ($match) {
    $result = array();
    $prev = null;
    foreach (str_split($match[0], 1) as $char) {
        if ($prev === null || $char != $prev) {
            $result[] = $char;
        }

        $prev = $char;
    }

    return implode('', $result);
}, $string);

Output :输出

My text has so much whitespace
Plenty of spaces and tabs

Edit : Readded this for it being a different approach.编辑:重新添加了这个,因为它是一种不同的方法。 It's probably not what's asked for, but it will at least not merge groups of different whitespace (eg space, tab, tab, space, nl, nl, space, space would become space, tab, space, nl, space ).这可能不是所要求的,但它至少不会合并不同空格的组(例如space, tab, tab, space, nl, nl, space, space会变成space, tab, space, nl, space )。

Had the same problem when passing echoed data from PHP to Javascript (formatted as JSON).将回显数据从 PHP 传递到 Javascript(格式为 JSON)时遇到了同样的问题。 The string was peppered with useless \\r\\n and \\t characters that are neither required nor displayed on the page.字符串中充斥着无用的 \\r\\n 和 \\t 字符,这些字符既不需要也不需要显示在页面上。

The solution i ended up using is another way of echoing.我最终使用的解决方案是另一种回声方式。 That saves a lot of server resources compared to preg_replace (as it is suggested by other people here).与 preg_replace 相比,这节省了大量服务器资源(正如这里其他人所建议的那样)。


Here the before and after in comparison:这是之前和之后的比较:

Before:前:

echo '
<div>

    Example
    Example

</div>
';

Output:输出:

<div>\\r\\n\\r\\n\\tExample\\r\\n\\tExample\\r\\n\\r\\n</div> <div>\\r\\n\\r\\n\\t示例\\r\\n\\t示例\\r\\n\\r\\n</div>


After:后:

echo 
'<div>',

    'Example',
    'Example',

'</div>';

Output:输出:

<div>ExampleExample</div> <div>示例示例</div>


(Yes, you can concatenate echo not only with dots, but also with comma.) (是的,您不仅可以将 echo 与点连接,还可以与逗号连接。)

Not sure if this will be useful nor am I absolutely positive it works like it should but it seems to be working for me.不确定这是否有用,我也不是绝对肯定它应该像它应该的那样工作,但它似乎对我有用。

A function that clears multiple spaces and anything else you want or don't want and produces either a single line string or a multi-line string (dependent on passed arguments/options).清除多个空格和任何您想要或不想要的东西并生成单行字符串或多行字符串(取决于传递的参数/选项)的函数。 Can also remove or keep characters for other languages and convert newline tabs to spaces.还可以删除或保留其他语言的字符并将换行符转换为空格。

/** ¯\_(ツ)_/¯ Hope it's useful to someone. **/
// If $multiLine is null this removes spaces too. <options>'[:emoji:]' with $l = true allows only known emoji.
// <options>'[:print:]' with $l = true allows all utf8 printable chars (including emoji).
// **** TODO: If a unicode emoji or language char is used in $options while $l = false; we get an odd � symbol replacement for any non-matching char. $options char seems to get through, regardless of $l = false ? (bug (?)interesting)
function alphaNumericMagic($value, $options = '', $l = false, $multiLine = false, $tabSpaces = "    ") {
    $utf8Emojis = '';
    $patterns = [];
    $replacements = [];
    if ($l && preg_match("~(\[\:emoji\:\])~", $options)) {
        $utf8Emojis = [
            '\x{1F600}-\x{1F64F}', /* Emoticons */
            '\x{1F9D0}-\x{1F9E6}',
            '\x{1F300}-\x{1F5FF}', /* Misc Characters */ // \x{1F9D0}-\x{1F9E6}
            '\x{1F680}-\x{1F6FF}', /* Transport and Map */
            '\x{1F1E0}-\x{1F1FF}' /* Flags (iOS) */
        ];
        $utf8Emojis = implode('', $utf8Emojis);
    }
    $options = str_replace("[:emoji:]", $utf8Emojis, $options);
    if (!preg_match("~(\[\:graph\:\]|\[\:print\:\]|\[\:punct\:\]|\\\-)~", $options)) {
        $value = str_replace("-", ' ', $value);
    }
    if ($l) {
        $l = 'u';
        $options = $options . '\p{L}\p{N}\p{Pd}';
    } else { $l = ''; }
    if (preg_match("~(\[\:print\:\])~", $options)) {
        $patterns[] = "/[ ]+/m";
        $replacements[] = " ";
    }
    if ($multiLine) {
        $patterns[] = "/(?<!^)(?:[^\r\na-z0-9][\t]+)/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options])|[^a-z0-9$options\s]/im$l";
        $patterns[] = "/\t/m";
        $patterns[] = "/(?<!^)$tabSpaces/m";
        $replacements[] = " ";
        $replacements[] = "";
        $replacements[] = $tabSpaces;
        $replacements[] = " ";
    } else if ($multiLine === null) {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[^a-z0-9$options]/im$l";
        $replacements = "";
    } else {
        $patterns[] = "/[\r\n\t]+/m";
        $patterns[] = "/[ ]+(?![a-z0-9$options\t])|[^a-z0-9$options ]/im$l";
        $replacements[] = " ";
        $replacements[] = "";
    }
    echo "\n";
    print_r($patterns);
    echo "\n";
    echo $l;
    echo "\n";
    return preg_replace($patterns, $replacements, $value);
}

Example usage:用法示例:

echo header('Content-Type: text/html; charset=utf-8', true);
$string = "fjl!sj\nfl _  sfjs-lkjf\r\n\tskj 婦女與環境健康 fsl \tklkj\thl jhj ⚧😄 lkj ⸀ skjfl gwo lsjowgtfls s";
echo "<textarea style='width:100%; height:100%;'>";
echo alphaNumericMagic($string, '⚧', true, null);
echo "\n\nAND\n\n";
echo alphaNumericMagic($string, '[:print:]', true, true);
echo "</textarea>";

Results in:结果是:

fjlsjflsfjslkjfskj婦女與環境健康fslklkjhljhj⚧lkjskjflgwolsjowgtflss

AND

fjl!sj
fl _ sfjs-lkjf
    skj 婦女與環境健康 fsl klkj hl jhj ⚧😄 lkj ⸀ skjfl gwo lsjowgtfls s

try with:尝试:

$string = "My    text       has so    much   whitespace    




Plenty of    spaces  and            tabs";
//Remove duplicate newlines
$string = preg_replace("/[\n]*/", "\n", $string); 
//Preserves newlines while replacing the other whitspaces with single space
echo preg_replace("/[ \t]*/", " ", $string); 

why you are doing like this?你为什么这样做?
html displays only one space even you use more than one space...即使您使用了多个空格,html 也只显示一个空格...

For example:例如:

<i>test               content 1       2 3 4            5</i>

The output willl be:输出将是:
test content 1 2 3 4 5测试内容 1 2 3 4 5

if you need more than single space in html, you have to use &nbsp;如果您在 html 中需要多个空格,则必须使用&nbsp;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM