简体   繁体   中英

Split large string by comma|semicolon in n-max-size chunks in JavaScript

I would like to split a large string by comma|semicolon into n-max-size chunks.

This similar question is very close to my situation, but what I really want is splitting by comma|semicolon , with n_max_size limit.

My situation: Using Text-to-Speech service for translating text to voice,but since the limit of the service provider, each request that has max 100 words limit, so I have to split an article to several substrings. If I just split it into fixed n-size, the pause/tone of the voice is not as same as a human.

What would be the best way in terms of performance to do this?

From comments I understand you don't want to split at each comma or semi-colon, but only when the maximum size is about to be reached. Also you want to keep the delimiter (the comma or semi-colon where you split at) in the result.

To add a max-size limit to the regular expression, you can use a regex like .{1,100} , where 100 is that maximum (for example). If your engine does not support the dotAll flag (yet), then use [^] instead of . to ensure that even newline characters are matched here.

To ensure that the split happens just after a delimiter, add (.$|[,;]) to the regex, and reduce the previous {1,100} to {1,99} .

Then there is the case where there is no delimiter in a substring of 100 or more characters: the following code will choose to then exceptionally allow a longer chunk, until a delimiter is found. You may want to add white space ( \\s ) as a possible delimiter too.

Here is a function that takes the size as argument and creates the corresponding regex:

 const mySplit = (s, maxSize=s.length) => s.match(new RegExp("(?=\\\\S)([^]{1," + (maxSize-1) + "}|[^,;]*)(.$|[,;])", "g")); console.log(mySplit("hello,this is a longer sentence without commas;but no problem", 20));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM