简体   繁体   English

正则表达式/JavaScript:拆分字符串以按每行最大字符数分隔行,并向后查找 n 个字符以寻找可能的空格?

[英]Regex / JavaScript: Split string to separate lines by max characters per line with looking n chars backwards for a possible whitespace?

This is similar question to How to split a string at every n characters or to nearest previous space , however, on the contrary to what I was expecting based on the title, that solution does not work if there is just one long word without any whitespace.这与How to split a string at every n characters or to near previous space类似,但是,与我根据标题所期望的相反,如果只有一个长词而没有任何空格,则该解决方案不起作用.

So I need a Regex which splits a string to separate lines (multiple times if needed) by max characters per line , and looking backwards n characters for a possible whitespace (break there if found, otherwise at max length)?所以我需要一个正则表达式,它通过每行的最大字符数来分割一个字符串以分隔行(如果需要多次),并向后看n 个字符以寻找可能的空格(如果找到则中断,否则为最大长度)?

Edit 1: For example, with max line length 30 characters with 15 characters backwards whitespace lookup:编辑 1:例如,最大行长度为 30 个字符,向后空格查找为 15 个字符:

Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumilydummy text of the印刷和排版行业。

That sentence's first word has a length of 32 characters.该句子的第一个单词的长度为 32 个字符。 So the output should be:所以输出应该是:

Loremipsumissimplydummytextoft  # Line has length of 30 char
he printing and typesetting     # Cut before the word at otherwise 30 char
industry.

So the first word should be force-cut after 30th character, as there was no whitespace.所以第一个单词应该在第 30 个字符之后强制剪切,因为没有空格。

The remaining string has a length of 28 (or 29 with the dash) before word 'industry', so at the place of 30th character there's a word, so the solution looks up for the previous whitespace within 15 characters range.剩余的字符串在单词“industry”之前的长度为 28(或带破折号的 29),因此在第 30 个字符的位置有一个单词,因此该解决方案在 15 个字符范围内查找前一个空格。 That line is broken before 'industry' word.这条线在“行业”一词之前被打破。

Edit 2: Second example of text:编辑 2:文本的第二个示例:

Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumilydummy text of the印刷和排版行业。 Loremipsumis simply dummytext ofthe printing and typesetting industry. Loremipsum 只是印刷和排版行业的虚拟文本。 Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumilydummy text of the印刷和排版行业。 Loremipsumis simply dummytext ofthe printing and typesetting industry. Loremipsum 只是印刷和排版行业的虚拟文本。

Should output:应该输出:

Loremipsumissimplydummytextoft
he printing and typesetting
industry. Loremipsumis simply
dummytext ofthe printing and
typesetting industry.
Loremipsumissimplydummytextoft
he printing and typesetting
industry. Loremipsumis simply
dummytext ofthe printing and
typesetting industry.

Use case for this regex is to format a long string into readable text with max line length enforced and lines starting with a character and not a whitespace.此正则表达式的用例是将长字符串格式化为可读文本,强制执行最大行长度,行以字符而不是空格开头。

Optional requirement: When after initial posting I added that example in Edit 1, I also added an optional requirement for adding a dash '-' character at start of the next line, if a word was cut at max line length.可选要求:在最初发布后,我在 Edit 1 中添加了该示例时,我还添加了一个可选要求,即在下一行的开头添加一个破折号“-”字符,如果一个单词在最大行长度处被剪切。 I'm removing that from the example now and adding it as a separate optional requirement here.我现在将其从示例中删除,并将其添加为单独的可选要求。

So an optional requirement: If a line is broken mid-word at max-length and not at a whitespace, then a dash should be appended at the end of that line (and not at start of the next line, as I had originally described).所以一个可选要求:如果一行在最大长度而不是空格处被打破,那么应该在该行的末尾附加一个破折号(而不是在下一行的开头,正如我最初描述的那样)。

Loremipsumissimplydummytextoft-  # Line length 30+1 char with an appended a dash
he printing and typesetting     # Cut before the word at otherwise 30 char
industry.

You may use您可以使用

 var s = "Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumis simply dummytext ofthe printing and typesetting industry. Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumis simply dummytext ofthe printing and typesetting industry."; var regex = /\\s*(?:(\\S{30})|([\\s\\S]{1,30})(?!\\S))/g; console.log( s.replace(regex, function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } ) )

Details细节

  • \\s* - 0 or more whitespace chars. \\s* - 0 个或多个空白字符。
  • (?: - start of the non-capturing group: (?: - 非捕获组的开始:
    • (\\S{30}) - Group 1 (referred to with the $1 variable in the callback method): thirty ( n ) non-whitespace chars (\\S{30}) - 第 1 组(在回调方法中用$1变量引用):三十 ( n ) 个非空白字符
    • | - or - 或者
    • ([\\s\\S]{1,30})(?!\\S)) - Group 2 (referred to with the $2 variable in the callback method): any one to thirty ( n ) chars, as many as possible, but not immediately followed with a non-whitespace char. ([\\s\\S]{1,30})(?!\\S)) - 第 2 组(在回调方法中用$2变量引用):任意一到三十 ( n ) 个字符,尽可能多,但没有紧跟一个非空白字符。

The function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } part means that if Group 1 matched (that is, we matched a very long word that is cut into two parts), we replace the match with Group 1 value + hyphen and a newline. function($0,$1,$2) { return $1 ? $1 + "-\\n" : $2 + "\\n"; } part 表示如果Group 1 匹配(即我们匹配了一个很长的单词被切成两部分),我们将匹配替换为Group 1 value + hyphen 和一个换行符。 Else, if Group 2 matches, we replace with Group 2 value + a newline.否则,如果第 2 组匹配,我们将替换为第 2 组值 + 换行符。

ES6+ compliant code snippet : ES6+ 兼容代码片段

 const text = "Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumis simply dummytext ofthe printing and typesetting industry. Loremipsumissimplydummytextofthe printing and typesetting industry. Loremipsumis simply dummytext ofthe printing and typesetting industry."; const lineMaxLen = 30; const wsLookup = 15; // Look backwards n characters for a whitespace const regex = new RegExp(String.raw`\\s*(?:(\\S{${lineMaxLen}})|([\\s\\S]{${lineMaxLen - wsLookup},${lineMaxLen}})(?!\\S))`, 'g'); console.log( text.replace(regex, (_, x, y) => x ? `${x}-\\n` : `${y}\\n`) );

Final answer:最终答案:

(\\S[\\s\\S]{1,30}$|\\S[\\s\\S]{1,29}(?:\\s+)|\\S{30}) (\\S[\\s\\S]{1,30}$|\\S[\\s\\S]{1,29}(?:\\s+)|\\S{30})

evolution:进化:

  1. ([\\s\\S]{1,15}(?!\\S)|\\S{15,}) ([\\s\\S]{1,15}(?!\\S)|\\S{15,})

you just have to modify the answer in the link by an 'or' statement that adds your additional requirement: |\\S{15,}您只需要通过添加额外要求的“或”语句修改链接中的答案:|\\S{15,}

  1. responding to your edits, here is my modified regex: ([\\s\\S]{1,15}(?!\\S)|\\S{15})响应您的编辑,这是我修改后的正则表达式:([\\s\\S]{1,15}(?!\\S)|\\S{15})

you can replace the 15s with 30 or the character cutoff of your choice您可以用 30 或您选择的字符截断值替换 15s

  1. adjusting for your further clarifications: (\\S[\\s\\S]{1,14}(?:\\s*)|\\S{15})调整您的进一步澄清: (\\S[\\s\\S]{1,14}(?:\\s*)|\\S{15})

Now the string has to start with a none-whitespace and it matches but does not capture additional white space after the first 15 characters.现在字符串必须以非空格开头,它匹配但不会在前 15 个字符之后捕获额外的空格。 Again you need to change the 15 and the 14 to the lengths you want.同样,您需要将 15 和 14 更改为您想要的长度。

  1. (\\S[\\s\\S]{1,30}$|\\S[\\s\\S]{1,29}(?:\\s+)|\\S{30}) Adding another condition in the multiple 'or' statement at the beginning which captures the end of the string if it ends in a none-whitespace character. (\\S[\\s\\S]{1,30}$|\\S[\\s\\S]{1,29}(?:\\s+)|\\S{30}) 在多个'或' 语句,如果字符串以非空白字符结尾,则捕获字符串的结尾。 If it ended in a whitespace character, the second part of the 'or' statement captures it.如果它以空格字符结尾,则 'or' 语句的第二部分将捕获它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 JavaScript函数中出现.split问题,该问题获取字符串中每个单词的最大重复字符数 - having an issue w/ .split in a JavaScript function that gets the max # of repeating chars per word in a string 将字符串拆分为每个数组元素的 2 个字符,长度为 n,然后 1 个字符用于 JavaScript 中的剩余元素 - Split string into 2 chars per array element for n length then 1 char for remaining elements in JavaScript 具有最多行数和每行最多字符的Textarea - Textarea with max number of lines and max characters per line JavaScript 正则表达式空白字符 - JavaScript regex whitespace characters Javascript Regex 限制每行的字符数 - Javascript Regex to Limit number of characters per line Javascript-在单独的行上分割字符串并输出结果 - Javascript - split string and output results on separate lines 如何将带有特殊字符 (🕵️‍,👏) 的 javascript 字符串拆分为单个字符? - How can a javascript string with special chars(🕵️‍,👏) be split into individual characters? Javascript正则表达式-删除字符,空格和开头0 - Javascript Regex - Remove Chars, WhiteSpace & Starting 0 Javascript正则表达式用于通过空格分割重音字符 - Javascript regex for splitting by whitespace for accented chars Javascript 使用正则表达式在不同的特定字符处拆分字符串 - Javascript with Regex to split string at different specific characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM