简体   繁体   English

如何从 PHP 中的文本中删除空行?

[英]How do I remove blank lines from text in PHP?

I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:我需要删除 PHP 中的空白行(带空格或绝对空白)。我使用这个正则表达式,但它不起作用:

$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);

I want a result of:我想要一个结果:

blahblah

blahblah

   adsa 


sad asdasd

will:将要:

blahblah
blahblah
   adsa 
sad asdasd
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);

The above regular expression says:上面的正则表达式说:

/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
    1st Capturing group (^[\r\n]*|[\r\n]+)
        1st Alternative: ^[\r\n]*
        ^ assert position at start of the string
            [\r\n]* match a single character present in the list below
                Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
                \r matches a carriage return (ASCII 13)
                \n matches a fine-feed (newline) character (ASCII 10)
        2nd Alternative: [\r\n]+
            [\r\n]+ match a single character present in the list below
            Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \r matches a carriage return (ASCII 13)
            \n matches a fine-feed (newline) character (ASCII 10)
    [\s\t]* match a single character present in the list below
        Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
        \s match any white space character [\r\n\t\f ]
        \tTab (ASCII 9)
    [\r\n]+ match a single character present in the list below
        Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        \r matches a carriage return (ASCII 13)
        \n matches a fine-feed (newline) character (ASCII 10)

Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated.您的ereg-replace()解决方案是错误的,因为不推荐使用ereg/eregi方法。 Your preg_replace() won't even compile, but if you add delimiters and set multiline mode, it will work fine:您的preg_replace()甚至不会编译,但如果您添加分隔符并设置多行模式,它将正常工作:

$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);

The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. m修饰符允许^匹配逻辑行的开头,而不仅仅是整个字符串的开头。 The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones.行首锚点是必要的,因为没有它,正则表达式将匹配每行末尾的换行符,而不仅仅是空白行。 You don't need the end-of-line anchor ( $ ) because you're actively matching the newline characters, but it doesn't hurt.您不需要行尾锚 ( $ ),因为您正在积极匹配换行符,但这并没有什么坏处。

The accepted answer gets the job done, but it's more complicated than it needs to be.接受的答案可以完成工作,但它比需要的复杂。 The regex has to match either the beginning of the string ( ^[\r\n]* , multiline mode not set) or at least one newline ( [\r\n]+ ), followed by at least one newline ( [\r\n]+ ).正则表达式必须匹配字符串的开头( ^[\r\n]* ,未设置多行模式)或至少一个换行符( [\r\n]+ ),后跟至少一个换行符( [\r\n]+ )。 So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line.因此,在以一个或多个空行开头的字符串的特殊情况下,它们将被替换为一个空行。 I'm pretty sure that's not the desired outcome.我很确定这不是我们想要的结果。

But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed.但大多数情况下,它会用一个换行符替换两个或多个连续的换行符,以及它们之间的任何水平空白(空格或制表符)。 That's the intent, anyway.无论如何,这就是意图。 The author seems to expect \s to match just the space character ( \x20 ), when in fact it matches any whitespace character.作者似乎希望\s只匹配空格字符 ( \x20 ),而实际上它匹配任何空白字符。 That's a very common mistake.这是一个很常见的错误。 The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.实际列表因一种正则表达式风格而异,但至少您可以期望\s匹配任何[ \t\f\r\n]匹配项。

Actually, in PHP you have a better option:实际上,在 PHP 中,您有更好的选择:

$str = preg_replace('/^\h*\v+/m', '', $str);

\h matches any horizontal whitespace character, and \v matches vertical whitespace. \h匹配任何水平空白字符,而\v匹配垂直空白。

Just explode the lines of the text to an array, remove empty lines using array_filter and implode the array again.只需将文本行分解为数组,使用array_filter删除空行并再次内爆数组。

$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);

Or in one line:或者在一行中:

$str = implode("\n", array_filter(explode("\n", $str)));

I don't know, but this is maybe faster than preg_replace .我不知道,但这可能比preg_replace快。

The comment from Bythos from Jamie's link above worked for me:来自 Jamie 上面链接的Bythos 的评论对我有用:

/^\n+|^[\t\s]*\n+/m

I didn't want to strip all of the new lines, just the empty/whitespace ones.我不想删除所有新行,只是空/空白行。 This does the trick!这是诀窍!

There is no need to overcomplicate things.没有必要把事情复杂化。 This can be achieved with a simple short regular expression:这可以通过一个简单的短正则表达式来实现:

$text = preg_replace("/(\R){2,}/", "$1", $text);

The (\R) matches all newlines. (\R)匹配所有换行符。
The {2,} matches two or more occurrences. {2,}匹配两次或多次出现。
The $1 Uses the first backreference (platform specific EOL) as the replacement. $1使用第一个反向引用(特定于平台的 EOL)作为替换。

This has been already answered long time ago but can greatly benefit for preg_replace and a much simplified pattern:很久以前就已经回答了这个问题,但可以极大地受益于preg_replace和更简化的模式:

$result = preg_replace('/\s*($|\n)/', '\1', $subject);

Pattern: Remove all white-space before a new-line -or- at the end of the string.模式:删除换行符之前的所有空白 - 或 - 在字符串的末尾。

Longest match wins:最长的比赛胜利:

  • As the white-space \s has a greedy quantifier * and contains \n consecutive empty lines are matched.由于空白\s具有贪婪量词*并且包含\n连续的空行被匹配。

  • As \s contains \r as well, \r\n new-line sequences are supported, however single \r (without \n ) are not.由于\s也包含\r ,因此支持\r\n换行序列,但不支持单个\r (没有\n )。

  • And when $ matches the end of the buffer the backreference \1 is empty allowing to handle trailing whitespace at the very end, too.$匹配缓冲区的末尾时,反向引用\1为空,允许在最后处理尾随空格。

If leading (empty) lines need to be removed as well, they have to match while not capturing, too (this was not directly asked for but could be appropriate):如果还需要删除前导(空)行,则它们必须在不捕获的同时进行匹配(这不是直接要求的,但可能是合适的):

$result = preg_replace('/^(?:\s*\n)+|\s*($|\n)/', '\1', $subject);
#                        '----------'

Pattern: Also remove all leading white-space (first line(s) are empty).模式:同时删除所有前导空白(第一行为空)。

And if the new-line at the end of the buffer should be normalized differently (always a newline at the end instead of never), it needs to be added: . "\n"如果缓冲区末尾的换行符应该以不同的方式规范化(总是在末尾换行而不是从不),则需要添加: . "\n" . "\n" . . "\n"

This variant is portable to \r\n , \r and \n new-line sequences ( (?>\r\n|\r|\n) ) or \R :此变体可移植到\r\n\r\n换行序列( (?>\r\n|\r|\n) )或\R

$result = preg_replace('/^(?> |\t|\r\n|\r|\n)+|(?> |\t|\r\n|\r|\n)*($|(?>\r\n|\r|\n))/', '\1', $subject);
# or:
$result = preg_replace('/^(?:\s*\R)+|\s*($|\R)/', '\1', $subject);

Pattern: Support all new-line sequences.模式:支持所有换行序列。

This is with the downside that the new-lines can not be normalized (eg any of the three to \n ).缺点是换行符无法标准化(例如\n的三个中的任何一个)。

Therefore, it can make sense to normalize new-lines before removing:因此,在删除之前规范化新行是有意义的:

$result = preg_replace(['/(?>\r\n|\n|\r)/', '/\s*($|\n)/'], ["\n", '\1'], $subject);
# or:
$result = preg_replace(['/\R/u', '/\s*($|\n)/'], ["\n", '\1'], $subject);

It ships with the opportunity to do some normalization apart from the line handling.除了线路处理之外,它还有机会进行一些标准化。

For example removal of the trailing white-space and fixing the missing new-line at the end of file.例如,删除尾随空格并修复文件末尾丢失的换行符。

Then doing more advanced line normalization, for example zero empty lines at the beginning and end;然后进行更高级的行规范化,例如在开头和结尾处零空行; otherwise not more than two consecutive empty lines:否则不超过两个连续的空行:

$result = preg_replace(
    ['/[ \t]*($|\R)/u', '/^\n*|(\n)\n*$|(\n{3})\n+/'], 
    ["\n"             , '\1\2'                      ], 
    $subject
);

The secondary pattern benefits from the first patterns replacements already.第二个模式已经受益于第一个模式的替换。

The power with preg_replace relies here in choosing the backreference(s) to replace with wisely. preg_replace的强大之处在于明智地选择要替换的反向引用。

Also using multiple patterns can greatly simplify things and keep the process maintainable.此外,使用多种模式可以极大地简化事情并保持流程的可维护性。

Try this one:试试这个:

$str = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\r\n", $str);

If you output this to a text file, it will give the same output in the simple Notepad , WordPad and also in text editors, for example Notepad++ .如果您将 output 写入文本文件,它将在简单的记事本写字板以及文本编辑器(例如Notepad++ )中给出相同的 output。

Use this:用这个:

$str = preg_replace('^\s+\r?\n$', '', $str);

The accepted answer leaves an extra line-break at the end of the string.接受的答案在字符串末尾留下一个额外的换行符。 Using rtrim() will remove this final linebreak:使用rtrim()将删除这个最后的换行符:

rtrim(preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string));

From this answer , the following works fine for me!这个答案,以下对我来说很好!

$str = "<html>
<body>";

echo str_replace(array("\r", "\n"), '', $str);
    <?php

    function del_blanklines_in_array_q($ar){
        $strip = array();
        foreach($ar as $k => $v){
            $ll = strlen($v);
            while($ll--){
                if(ord($v[$ll]) > 32){  //hex /0x20 int 32 ascii SPACE
                    $strip[] = $v; break; 
                }
            }
        }
        return $strip;
    }

    function del_blanklines_in_file_q($in, $out){
        // in filename, out filename
        $strip = del_blanklines_in_array_q(file($in));
        file_put_contents($out, $strip );
    }
$file = "file_name.txt";
$file_data = file_get_contents($file);
$file_data_after_remove_blank_line = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $file_data );
file_put_contents($file,$file_data_after_remove_blank_line);
function trimblanklines($str) {
    return preg_replace('`\A[ \t]*\r?\n|\r?\n[ \t]*\Z`','',$str);
}

This one only removes them from the beginning and end, not the middle (if anyone else was looking for this).这个只从开头和结尾删除它们,而不是中间(如果其他人正在寻找这个)。

nl2br(preg_replace('/^\v+/m', '', $r_msg)) nl2br(preg_replace('/^\v+/m', '', $r_msg))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM