[英]Regular expression for utf-8 string sliceing at linebreaks or after a number of characters
I found a function on the web, that uses a regular experssion, to iterate over a string and inserts linebreaks after a specified number of characters, so it will fit into a narrow table cell with a fixed width. 我在网上找到了一个函数,该函数使用常规表达式来遍历字符串,并在指定数量的字符后插入换行符,因此它将适合固定宽度的狭窄表格单元格。 here is the function:
这是函数:
/**
* wordwrap for utf8 encoded strings
*
* @param string $str
* @param integer $len
* @param string $what
* @return string
* @author Milian Wolff <mail@milianw.de>
*/
function utf8_wordwrap($str, $width, $break, $cut = false) {
if (!$cut || $_SESSION['wordwrap']) {
$regexp = '#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){'.$width.'}#';
} else {
return $str; //if no wordwrap turned on, returns the original string
}
if (function_exists('mb_strlen')) {
$str_len = mb_strlen($str,'UTF-8');
} else {
$str_len = preg_match_all('/[\x00-\x7F\xC0-\xFD]/', $str, $var_empty);
}
$while_what = ceil($str_len / $width);
$i = 1;
$return = '';
while ($i < $while_what) {
preg_match($regexp, $str,$matches);
$string = $matches[0];
$return .= $string.$break;
$str = substr($str, strlen($string));
$i++;
}
return $return.$str;
}
here is the regexp: 这是正则表达式:
#^(?:[\x00-\x7F]|[\xC0-\xFF][\x80-\xBF]+){20}#
It does its job well, if it's combined with a while loop until there is a line break character in the string. 如果将它与while循环结合使用,直到字符串中有换行符,它就会很好地完成工作。
An example string: 一个示例字符串:
1. first
2. second
3. third
The output of prag_match: prag_match的输出:
array (
0 => '1. first
2. second
3',
)
so it just counts for the 20th character, and returns it. 因此它只计算第20个字符并返回。
What I would need is: To make it return everything until a new line char (\\n) OR if there isn't any, return the first 20 char. 我需要的是:要使其返回所有内容,直到换行符(\\ n),或者如果没有,则返回前20个字符。 So the output in this case would be something like this:
因此,在这种情况下的输出将是这样的:
array (
0 => '1. first',
1 => '2. second',
2 => '3. third'
)
UPDATE: I tried Steve Robbins's answer and it worked perfectly, until the string had some spec UTF-8 characters in it. 更新:我尝试了史蒂夫·罗宾斯(Steve Robbins)的答案,并且效果很好,直到字符串中包含一些规范UTF-8字符。 It's my fault, I didn't provide a decent example in the first place.
这是我的错,我没有首先提供一个体面的例子。 Here is what it does:
这是它的作用:
<?php
header('Content-type: text/html; charset=UTF-8');
$input = '1. first
2. second
3. third
ez eg nyoulőűúúú3456789öüö987654323456789öü
pam
param';
$output = array();
foreach (explode("\n", $input) as $value) {
foreach (str_split($value, 20) as $v) {
$trimmed = trim($v);
if (!empty($trimmed))
$output[] = $trimmed;
}
}
var_dump($output);
And the output is: 输出为:
array(8) {
[0]=>
string(8) "1. first"
[1]=>
string(9) "2. second"
[2]=>
string(8) "3. third"
[3]=>
string(20) "ez eg nyoulőűúú�"
[4]=>
string(20) "�3456789öüö987654"
[5]=>
string(13) "323456789öü"
[6]=>
string(3) "pam"
[7]=>
string(5) "papam"
}
Why use regex? 为什么要使用正则表达式?
<?php
$input = '1. first
2. second
3. third';
$output = array();
foreach (explode("\n", $input) as $value) {
foreach (str_split($value, 20) as $v) {
$trimmed = trim($v);
if (!empty($trimmed))
$output[] = $trimmed;
}
}
var_dump($output);
Gives 给
array(3) {
[0]=>
string(8) "1. first"
[1]=>
string(9) "2. second"
[2]=>
string(8) "3. third"
}
Example: http://codepad.org/OoillEUu 示例: http : //codepad.org/OoillEUu
Thanks everyone for your efforts! 感谢大家的努力! I've found the solution here
我在这里找到了解决方案
<?php
header('Content-Type: text/html; charset=utf-8');
$input = '1. first
2. second
3. third
ez eg nyoulőűúúú3456789öüö987654323456789öü
pam
papam';
var_dump(utf8_wordwrap($input,20,"<br>",true));
function utf8_wordwrap($string, $width=20, $break="\n", $cut=false)
{
if($cut) {
// Match anything 1 to $width chars long followed by whitespace or EOS,
// otherwise match anything $width chars long
$search = '/(.{1,'.$width.'})(?:\s|$)|(.{'.$width.'})/uS';
$replace = '$1$2'.$break;
} else {
// Anchor the beginning of the pattern with a lookahead
// to avoid crazy backtracking when words are longer than $width
$pattern = '/(?=\s)(.{1,'.$width.'})(?:\s|$)/uS';
$replace = '$1'.$break;
}
return preg_replace($search, $replace, $string);
}
?>
string '1. first
<br>2. second
<br>3. third
<br>ez eg<br>nyoulőűúúú3456789öüö<br>987654323456789öü
<br>pam
<br>papam<br>' (length=122)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.