简体   繁体   English

REGEX - 匹配长字符串中的模式

[英]REGEX - Match a pattern in a lengthy string

I am trying to match a particular pattern in a lengthy string:我正在尝试匹配长字符串中的特定模式:

NEW ZEALAND AND (data.operator1:"SHELL AND AMP" AND data.field:"NEW ZEALAND") OR (data.operator:purpose AND data.field:crank) OR (data.operator:REGULATOR AND data.field:HELICOPTOR) NEW ZEALAND AND (data.operator1:"SHELL AND AMP" AND data.field:"NEW ZEALAND") OR (data.operator: purpose AND data.field:crank) OR (data.operator:REGULATOR AND data.field:HELICOPTOR )

  1. I want to select all the below values followed by: but not the AND/OR/NOT operator.我想 select 以下所有值后跟:但不是 AND/OR/NOT 运算符。
  2. I am trying to use look ahead and look after/behind feature in Regex but unable to achieve it我正在尝试在正则表达式中使用前瞻和后顾/后顾功能,但无法实现

Basically a combination of /(?<?AND)(?<?OR)\s+(:!AND)(?!OR)/g and :" [a-zA-Z ] "基本上是 /(?<?AND)(?<?OR)\s+(:!AND)(?!OR)/g 和 :" [a-zA-Z ] " 的组合

I want to change the strings to title case so that I can clearly distinguish AND/OR/NOT.我想将字符串更改为标题大小写,以便我可以清楚地区分 AND/OR/NOT。

New Zealand AND (data.operator1:"Shell And Amp" AND data.field:"New Zealand") OR (data.operator:purpose AND data.field:crank) OR (data.operator:Regulator AND data.field:Helicoptor)新西兰 AND (data.operator1:"Shell And Amp" AND data.field:"New Zealand") OR (data.operator: purpose AND data.field:crank) OR (data.operator:Regulator AND data.field:Helicoptor )

You can easily express lexers using regular expressions with named groups, for example:您可以使用带有命名组的正则表达式轻松表达词法分析器,例如:

const MY_LEXER = String.raw`
    (?<string> "[^"]*")
    |
    (?<operator> and|or|AND|OR)
    |
    (?<word> \w+)
    |
    (?<punct> [().:])
    |
    (?<ws> \s+)
`

The next function gets a string and a lexer and returns a list of pairs [token-type, token-value] :下一个 function 获取一个字符串和一个词法分析器并返回一个对列表[token-type, token-value]

let tokenize = (str, lexer) =>
    [...
        str.matchAll(
            lexer.replace(/\s+/g, ''))
    ]
        .flatMap(m =>
            Object
                .entries(m.groups)
                .filter(p => p[1]))

The result will be like结果会像

  [ 'word', 'NEW' ],
  [ 'ws', ' ' ],
  [ 'word', 'ZEALAND' ],
  [ 'ws', ' ' ],
  [ 'operator', 'AND' ],
  [ 'ws', ' ' ],
  [ 'punct', '(' ],

etc. Now it should be possible to iterate that, transform values as you need and put them back together:等等。现在应该可以迭代它,根据需要转换值并将它们重新组合在一起:

for (let [type, val] of tokenize(myString, MY_LEXER)) {
    if (type === 'string' || type === 'word')
        val = val.toLowerCase();
    output += val;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM