简体   繁体   English

正则表达式数据值 Javascript 空白

[英]RegEx Data Values Javascript white Space

I am trying to add the correct white space for data i am receiving.我正在尝试为我收到的数据添加正确的空格。 currently it shows like this目前它显示这样

NotStarted没有开始

ReadyforPPPDReview ReadyforPPPD审查

this is the code i am using这是我正在使用的代码

.replace(/([A-Z])/g, '$1')

"NotStarted" shows correct "Not Started" but "ReadyforPPPDReview" shows "Readyfor PPPD Review" when it should look like this "Ready for PPPD Review" “NotStarted”显示正确的“Not Started”,但“ReadyforPPPDReview”显示“Readyfor PPPD Review”,当它看起来像这样“Ready for PPPD Review”时

what is the best way to handle both of these using one regex or function?使用一个正则表达式或 function 处理这两种方法的最佳方法是什么?

You would need an NLP engine to handle this properly.您需要一个 NLP 引擎来正确处理这个问题。 Here are two approaches with simple regex, both have limitations:以下是两种使用简单正则表达式的方法,它们都有局限性:

1. Use list of stop words 1.使用停用词列表

We blindly add spaces before and after the stop words:我们盲目地在停用词前后添加空格:

 var str = 'NotStarted, ReadyforPPPDReview'; var wordList = 'and, for, in, on, not, review, the'; // stop words var wordListRe = new RegExp('(' + wordList.replace(/, */g, '|') + ')', 'gi'); var result1 = str.replace(wordListRe, ' $1 ') // add space before and after stop words.replace(/([az])([AZ])/g, '$1 $2') // add space between lower case and upper case chars.replace(/ +/g, ' ') // remove excessive spaces.trim(); // remove spaces at start and end console.log('str: ' + str); console.log('result1: ' + result1);

As you can imagine the stop words approach has some severe limitations.正如您可以想象的那样,停用词方法有一些严重的局限性。 For example, words formula input would result in for mula in put .例如,单词formula input将导致for mula in put

1. Use a mapping table 1.使用映射表

The mapping table lists words that need to be spaced out (no drugs involved), as in this code snippet:映射表列出了需要分隔的单词(不涉及药物),如以下代码片段所示:

 var str = 'NotStarted, ReadyforPPPDReview'; var spaceWordMap = { NotStarted: 'Not Started', Readyfor: 'Ready for', PPPDReview: 'PPPD Review' // add more as needed }; var spaceWordMapRe = new RegExp('(' + Object.keys(spaceWordMap).join('|') + ')', 'gi'); var result2 = str.replace(spaceWordMapRe, function(m, p1) { // m: matched snippet, p1: first group return spaceWordMap[p1] // replace key in spaceWordMap with its value }).replace(/([az])([AZ])/g, '$1 $2') // add space between lower case and upper case chars.replace(/ +/g, ' ') // remove excessive spaces.trim(); // remove spaces at start and end console.log('str: ' + str); console.log('result2: ' + result2);

This approach is suitable if you have a deterministic list of words as input.如果您有一个确定的单词列表作为输入,则此方法适用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM