简体   繁体   English

正则表达式按特定长度拆分字符串,并忽略不完整的单词

[英]regex split string at specific length and ignore incomplete word

I want to split text when length is 30 including space. 我想在长度为30(含空格)时拆分文本。 My work so far: 到目前为止我的工作:

var m = "Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co";

var spacedM = m.split(' ');
var charCount = 0;

for(var i = 0; i < spacedM.length; i++){

    charCount = charCount + spacedM[i].length + 0.5; 

if(charCount <= 30 && $('#address1').text().length <= 30){
    $('#address1').append(spacedM[i]+' ');
} else if(charCount > 30 && charCount <= 60 && $('#address2').text().length <= 30) {
    $('#address2').append(spacedM[i]+' ');
} else if(charCount > 60 && charCount <= 90 && $('#address3').text().length <= 30) {
        $('#address3').append(spacedM[i]+' ');
}

}

$('#address1').append($('#address1').text().length);
$('#address2').append($('#address2').text().length);
$('#address3').append($('#address3').text().length);

//output
Lorem ipsum dolor sit amet, co 31
Lorem ipsum dolor sit amet, co 31
Lorem ipsum dolor sit amet, co 31

It look like ok. 看起来还可以。 But it kind of a hack too. 但这也是一种骇客。 Isn't it?. 是不是? I welcome any suggestion to improve this solution. 我欢迎提出任何改进此解决方案的建议。 Since this code will be used to split address for older data to map it inside 3 fields of address. 由于此代码将用于拆分旧数据的地址,以将其映射到地址的3个字段中。 Below is my jsfiddle: https://jsfiddle.net/u11p6xx4/4/ 以下是我的jsfiddle: https ://jsfiddle.net/u11p6xx4/4/

UPDATED: I do not want split words. 更新:我不想分裂单词。 Because word in address can't split to 2 part if they are meant for 1 word. 因为如果将地址中的单词用于1个单词,则不能将其分为2部分。 So it is actually splitting address when chars are less than 30 but don't split word. 因此,当chars小于30但不拆分单词时,它实际上是在拆分地址。 The chars can be 28 in length and then continue in #address2 字符长度可以为28,然后在#address2中继续

Example address : Blok 53-11-04 Apartment Flamingo, Keramat Jaya 2 Persiaran Gurney 示例地址: Blok 53-11-04 Apartment Flamingo, Keramat Jaya 2 Persiaran Gurney

Expected : 预期:

Blok 53-11-04 Apartment
Flamingo, Keramat Jaya 2
Persiaran Gurney

Why can't you just use regex? 为什么不能只使用正则表达式? Like: 喜欢:

var m = "Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co";

var n = m.match(/.{31}/g);
$('#address1').append(n[0]);
$('#address2').append(n[1]);
$('#address3').append(n[2]);

$('#address1').append($('#address1').text().length);
$('#address2').append($('#address2').text().length);
$('#address3').append($('#address3').text().length);

// output
// Lorem ipsum dolor sit amet, co 31
// Lorem ipsum dolor sit amet, co 31
// Lorem ipsum dolor sit amet, co 31

But what happens if there is a 4th group matched? 但是,如果有第四组匹配会怎样? Do you just ignore everything from the ((31*3)+1)-th character? 您只是忽略第((31 * 3)+1)个字符中的所有内容吗?

Update : 更新

Try using this regex /[^\\W].{1,30}(?:\\s|$)/g , you'll still need to improve it but it should get you started: 尝试使用此正则表达式/[^\\W].{1,30}(?:\\s|$)/g ,您仍然需要对其进行改进,但是它可以帮助您入门:

var m = "Blok 53-11-04 Apartment Flamingo, Keramat Jaya 2 Persiaran Gurney";

var n = m.match(/.{1,30}(?:\s|$)/g); // or /[^\W].{1,30}(?:\s|$)/g

$('#address1').append(n[0]);
$('#address2').append(n[1]);
$('#address3').append(n[2]);

// output
// Blok 53-11-04 Apartment
// Flamingo, Keramat Jaya 2
// Persiaran Gurney

You can expirement here: https://regex101.com/r/TIRa6L/2 您可以在此处过期: https ://regex101.com/r/TIRa6L/2

If you wan't a more reliable approach try a so called "address verification api". 如果您没有更可靠的方法,请尝试所谓的“地址验证api”。 Something like: 就像是:

It should be able to parse a 1 line address and convert it into the correct multi-line format. 它应该能够解析1行地址并将其转换为正确的多行格式。

Using a For loop like your original post. 像您的原始帖子一样使用For循环。 Not sure what your requirements are as far as truncation goes though. 不确定您对截断的要求是什么。 This snippet does not care about truncating words. 此代码片段不关心截断单词。 It just splits at every 30 chars. 它每30个字符分割一次。

<!-- goal is to split text when length is 30 including space -->
var m = "Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co Lorem ipsum dolor sit amet, co";

var spacedM = m.split('');
var charCount = 0;
var theString = "";
var theStrings = [];
for(var b=0; b < spacedM.length; b++)
{
    theString = theString + spacedM[b];
    if(charCount == 29)
    {
        theStrings.push(theString);//add this string to the array of strings
        theString = "";//reset theString
        charCount = 0;//reset the charCount
    }
    charCount++;//increment the charCount
}

for(var i=0; i < theStrings.length ;i++)
{
    console.log(theStrings[i]);
}

May can use regexp to match it. 可以使用regexp来匹配它。 https://regex101.com/r/IszFAZ/1 https://regex101.com/r/IszFAZ/1

And it can support the last word with any length. 它可以支持任意长度的最后一个单词。

 var m = "Lorem ipsum dolor sit amet, c1 Lorem ipsum dolor sit amet, co2 Lorem ipsum dolor sit amet, coo3 Lorem ipsum dolor sit amet, c4"; console.log(m.match(/(?!\\s).{30,}?(?=\\s|$)/g)); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM