简体   繁体   English

匹配此字符串的正则表达式是什么?

[英]What is the regex to match this string?

Consider these sentences: 考虑以下句子:

apple is 2kg 苹果是2公斤
apple banana mango is 2kg 苹果香蕉芒果是2公斤
apple apple apple is 6kg 苹果苹果苹果6公斤
banana banana banana is 6kg 香蕉香蕉香蕉6kg

Given that "apple", "banana", and "mango" are the only fruits, what would be the regex to extract the fruit name(s) that appear in the start of the sentence? 鉴于“苹果”,“香蕉”和“芒果”是唯一的水果,提取出现在句子开头的水果名称的正则表达式是什么?

I wrote this regex ( https://regex101.com/r/fY8bK1/1 ): 我写了这个正则表达式( https://regex101.com/r/fY8bK1/1 ):

^(apple|mango|banana) is (\d+)kg$  

but this only matches if a single fruit is in the sentence. 但这仅在句子中只有一个水果时才匹配。

How do I extract all the fruit names? 如何提取所有水果名称?

The expected output, for all 4 sentences, should be: 所有4个句子的预期输出应为:

apple, 2 苹果2
apple banana mango, 2 苹果香蕉芒果2
apple apple apple, 6 苹果苹果苹果6
banana banana banana, 6 香蕉香蕉香蕉6

You can use grouping like this: 您可以像这样使用分组:

^((?:apple|mango|banana)(?:\s+(?:apple|mango|banana))*) is (\d+)kg$

See regex demo 正则表达式演示

The (?:...) is a non-capturing group inside a capturing ( (...) ) group so as not to create a mess in the output. (?:...)是捕获( (...) )组中的一个非捕获组,以免在输出中造成混乱。

The ((?:apple|mango|banana)(?:\\s+(?:apple|mango|banana))*) group matches: ((?:apple|mango|banana)(?:\\s+(?:apple|mango|banana))*)组匹配:

  • (?:apple|mango|banana) - any value from the alternative list delimited with alternation | (?:apple|mango|banana) -替代列表中以交替符分隔的任何值| operator. 操作员。 If you plan to match whole words only, put \\b at both ends of the subpattern. 如果您打算只匹配整个单词,请在子模式的两端放置\\b
  • (?:\\s+(?:apple|mango|banana))* matches 0 or more sequences of... (?:\\s+(?:apple|mango|banana))*匹配0个或多个序列...
    • \\s+ - 1 or more whitespace \\s+ -1个或多个空格
    • (?:apple|mango|banana) - any of the alternatives. (?:apple|mango|banana) -任何其他选择。

Snippet: 片段:

 var re = /^((?:apple|mango|banana)(?:\\s+(?:apple|mango|banana))*) is (\\d+)kg$/gm; var str = 'apple is 2kg\\napple banana mango is 2kg\\napple apple apple is 6kg\\nbanana banana banana is 6kg'; var m; while ((m = re.exec(str)) !== null) { document.write(m[1] + "," + m[2] + "<br/>"); } document.write("<b>appleapple is 2kg</b> matched: " + /^((?:apple|mango|banana)(?:\\s+(?:apple|mango|banana))*) is (\\d+)kg$/.test("appleapple is 2kg")); 

Try this 尝试这个

var re = /^((?:(?:apple|banana|mango)(?= ) ?)+) is (\d+)kg$/gm;

re.exec('apple banana mango is 2kg');
// ["apple banana mango is 2kg", "apple banana mango", "2"]

What is making this different to the other answers? 这与其他答案有何不同? The (?= ) ? (?= ) ? after the fruit options forces a space as the next character but doesn't capture it unless there are more fruits (or you double spaced the is ). 在水果选项强制使用空格作为下一个字符之后,除非有更多的水果(否则您将is换成两倍),否则不会捕获它。

正则表达式可视化

Use this in a while loop to get all the results from a multi-line string. while循环中使用它可以从多行字符串中获取所有结果。


The gm flags here let this RegExp be applied to the same String multiple times using re.exec , where new lines match $^ . 此处的gm标志使用re.exec将此RegExp多次应用于同一String ,其中新行匹配$^ However, the g flag causes str.match to behave differently. 但是, g标志导致str.match表现不同。

If you want an independent test for each string you could continue using re.exec or remove these flags and use str.match instead 如果要对每个字符串进行独立测试,则可以继续使用re.exec或删除这些标志并改用str.match

var re = /^((?:(?:apple|banana|mango)(?= ) ?)+) is (\d+)kg$/; // notice flags gone

'apple banana mango is 2kg'.match(re);
// ["apple banana mango is 2kg", "apple banana mango", "2"]
/^(((apple|mango|banana)\s*)+) is (\d+)kg$/$1,$4/gm

DEMO: https://regex101.com/r/sA4aW7/2 演示: https : //regex101.com/r/sA4aW7/2

So you start from here, one of: 因此,您从这里开始,其中之一:

(apple|mango|banana)

Lets get the eventual whitespace separating repetitions: 让我们得到最终的空格分隔重复:

(apple|mango|banana)\s*

and all (one at the least) of the repetitions: 以及所有(至少一个)重复:

((apple|mango|banana)\s*)+

Need to add an additional group, because you want a single group capturing the lot: 需要添加一个额外的组,因为您想要一个组来捕获批次:

(((apple|mango|banana)\s*)+)

Add this point, $1 (the outermost group) will contain "banana banana banana ..."; 加上这一点, $1 (最外面的组)将包含“ banana banana banana ...”; the fourth your weight. 第四,你的体重。 Add your own ?: to avoid capturing inner groups if you like . 添加您自己?:避免捕获内群体,如果你喜欢

^((?:apple|mango|banana| )+) is (\d+)kg\s?$/gmi

DEMO 演示

https://regex101.com/r/dO1rR7/1 https://regex101.com/r/dO1rR7/1


Explanation 说明

^((?:apple|mango|banana| )+) is (\d+)kg\s?$/gmi

^ assert position at start of a line
1st Capturing group ((?:apple|mango|banana| )+)
    (?:apple|mango|banana| )+ Non-capturing group
        Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
        1st Alternative: apple
            apple matches the characters apple literally (case sensitive)
        2nd Alternative: mango
            mango matches the characters mango literally (case sensitive)
        3rd Alternative: banana
            banana matches the characters banana literally (case sensitive)
        4th Alternative:  
             matches the character  literally
 is matches the characters  is literally (case sensitive)
2nd Capturing group (\d+)
    \d+ match a digit [0-9]
        Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
kg matches the characters kg literally (case sensitive)
\s? match any white space character [\r\n\t\f ]
    Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM