简体   繁体   English

用逗号|分号在 JavaScript 中的 n-max-size 块中分割大字符串

[英]Split large string by comma|semicolon in n-max-size chunks in JavaScript

I would like to split a large string by comma|semicolon into n-max-size chunks.我想用逗号|分号将一个大字符串分割成 n-max-size 块。

This similar question is very close to my situation, but what I really want is splitting by comma|semicolon , with n_max_size limit. 这个类似的问题非常接近我的情况,但我真正想要的是用逗号|分号分割,有n_max_size限制。

My situation: Using Text-to-Speech service for translating text to voice,but since the limit of the service provider, each request that has max 100 words limit, so I have to split an article to several substrings.我的情况:使用 Text-to-Speech 服务将文本翻译成语音,但由于服务提供商的限制,每个请求最多 100 字限制,所以我不得不将一篇文章拆分为几个子字符串。 If I just split it into fixed n-size, the pause/tone of the voice is not as same as a human.如果我只是将其拆分为固定的 n 大小,则声音的停顿/语气与人类不同。

What would be the best way in terms of performance to do this?就性能而言,这样做的最佳方法是什么?

From comments I understand you don't want to split at each comma or semi-colon, but only when the maximum size is about to be reached.从评论中我了解到您不想在每个逗号或分号处拆分,而仅在即将达到最大大小时才拆分。 Also you want to keep the delimiter (the comma or semi-colon where you split at) in the result.您还希望在结果中保留分隔符(您分隔的逗号或分号)。

To add a max-size limit to the regular expression, you can use a regex like .{1,100} , where 100 is that maximum (for example).要将最大大小限制添加到正则表达式,您可以使用像.{1,100}这样的正则表达式,其中 100 是最大值(例如)。 If your engine does not support the dotAll flag (yet), then use [^] instead of .如果您的引擎不支持dotAll 标志(尚),则使用[^]而不是. to ensure that even newline characters are matched here.以确保即使是换行符也在这里匹配。

To ensure that the split happens just after a delimiter, add (.$|[,;]) to the regex, and reduce the previous {1,100} to {1,99} .为确保拆分发生分隔符之后,请将(.$|[,;])到正则表达式,并将之前的{1,100}减少到{1,99}

Then there is the case where there is no delimiter in a substring of 100 or more characters: the following code will choose to then exceptionally allow a longer chunk, until a delimiter is found.然后是 100 个或更多字符的子字符串中没有分隔符的情况:以下代码将选择然后异常允许更长的块,直到找到分隔符。 You may want to add white space ( \\s ) as a possible delimiter too.您可能还想添加空格 ( \\s ) 作为可能的分隔符。

Here is a function that takes the size as argument and creates the corresponding regex:这是一个将大小作为参数并创建相应正则表达式的函数:

 const mySplit = (s, maxSize=s.length) => s.match(new RegExp("(?=\\\\S)([^]{1," + (maxSize-1) + "}|[^,;]*)(.$|[,;])", "g")); console.log(mySplit("hello,this is a longer sentence without commas;but no problem", 20));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM