[英]REGEX - Match a pattern in a lengthy string
我正在嘗試匹配長字符串中的特定模式:
NEW ZEALAND AND (data.operator1:"SHELL AND AMP" AND data.field:"NEW ZEALAND") OR (data.operator: purpose AND data.field:crank) OR (data.operator:REGULATOR AND data.field:HELICOPTOR )
基本上是 /(?<?AND)(?<?OR)\s+(:!AND)(?!OR)/g 和 :" [a-zA-Z ] " 的組合
我想將字符串更改為標題大小寫,以便我可以清楚地區分 AND/OR/NOT。
新西蘭 AND (data.operator1:"Shell And Amp" AND data.field:"New Zealand") OR (data.operator: purpose AND data.field:crank) OR (data.operator:Regulator AND data.field:Helicoptor )
您可以使用帶有命名組的正則表達式輕松表達詞法分析器,例如:
const MY_LEXER = String.raw`
(?<string> "[^"]*")
|
(?<operator> and|or|AND|OR)
|
(?<word> \w+)
|
(?<punct> [().:])
|
(?<ws> \s+)
`
下一個 function 獲取一個字符串和一個詞法分析器並返回一個對列表[token-type, token-value]
:
let tokenize = (str, lexer) =>
[...
str.matchAll(
lexer.replace(/\s+/g, ''))
]
.flatMap(m =>
Object
.entries(m.groups)
.filter(p => p[1]))
結果會像
[ 'word', 'NEW' ],
[ 'ws', ' ' ],
[ 'word', 'ZEALAND' ],
[ 'ws', ' ' ],
[ 'operator', 'AND' ],
[ 'ws', ' ' ],
[ 'punct', '(' ],
等等。現在應該可以迭代它,根據需要轉換值並將它們重新組合在一起:
for (let [type, val] of tokenize(myString, MY_LEXER)) {
if (type === 'string' || type === 'word')
val = val.toLowerCase();
output += val;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.