简体   繁体   English

PHP Regex匹配x字符后的第一个换行符以进行修剪

[英]PHP Regex match first newline after x characters for a trimming function

I'm writing a trimming function that takes a string and finds the first newline \\n character after the 500th character and returns a string up to the newline. 我正在编写一个修剪函数,该函数接受一个字符串并找到第500 \\n字符之后的第一个换行符\\n并返回一个字符串,直到换行符为止。 Basically, if there are \\n at indices of 200, 400, and 600, I want the function to return the first 600 characters of the string (not including the \\n ). 基本上,如果在索引200、400和600处有\\n ,我希望函数返回字符串的前600个字符(不包括\\n )。

I tried: 我试过了:

$output = preg_replace('/([^%]{500}[^\n]+?)[^%]*/','$1',$output);

I used the percent sign because I couldn't find a character class that just encompassed "everthing". 我使用百分号是因为找不到一个包含“所有内容”的字符类。 Dot didn't do it because it excluded newlines. Dot没有这样做,因为它排除了换行符。 Unfortunately, my function fails miserably. 不幸的是,我的功能失败了。 Any help or guidance would be appreciated. 任何帮助或指导,将不胜感激。

Personally I would avoid regex and use simple string functions: 我个人将避免使用正则表达式,而使用简单的字符串函数:

// $str is the original string
$nl = strpos( $str, "\n", 500 ); // finds first \n starting from char 500
$sub = substr( $str, 0, $nl );
$final = str_replace( "\n", ' ', $sub );

You might need to check for \\r\\n as well - ie normalize first using str_replace( "\\r\\n", "\\n", $str ) . 您可能还需要检查\\r\\n即首先使用str_replace( "\\r\\n", "\\n", $str )规范化。

You can add the s (DOTALL) modifier to make . 您可以添加s (DOTALL)修饰符来制作. match newlines, then just make the second bit ungreedy. 匹配换行符,然后让第二点变得不愉快。 I've also made it match everything if the string is under 500 characters and anchored it to the start: 如果字符串少于500个字符,我还使其与所有内容匹配,并将其锚定在开头:

preg_match('/^.{500}[^\n]+|^.{0,500}$/s', $output, $matches);
$output = $matches[0];

use 采用

'/(.{500,}?)(?=\n)/s' 

as pattern 作为模式

the /s at the end makes the dot catch newlines, {500,} means "match 500 or more" with the question mark matching as few as possible. 末尾的/ s使点捕捉换行符,{500,}表示“匹配500或更多”,并且问号尽可能少地匹配。 the (?=\\n) is a positive lookahead, which means the whole matched string has to be followed by a \\n, but the lookahead doesn't capture anything. (?= \\ n)是一个正向前行,这意味着整个匹配的字符串必须后跟一个\\ n,但前行不捕获任何内容。 so it checks that the 500+ character string is followed by a newline, but doesn't include the newline in the match (or the replace, for that matter). 因此它会检查500+字符串后是否有换行符,但在匹配项(或替换项)中不包括换行符。

Though the lookahead thingy is a little fancy in this case, I guess 尽管在这种情况下,事前事有点花哨,但我想

'/(.{500,}?)\n/s'

would do just as well. 也会做得很好。 I just like lookaheads :) 我只是喜欢提前:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM