简体   繁体   English

匹配最小可能的组java regexp

[英]Matching smallest possible group java regexp

I am trying to figure out how to get this regex to work the way I need it to properly. 我试图弄清楚如何让这个正则表达式以我需要它的方式正常工作。 Basically I have a bunch of songs with there lyrics. 基本上我有一堆歌词和歌词。 I am looping through each song lyrics to see if they match the search phrase I am looking for and return the length of the string as like a match rating. 我正在循环浏览每首歌词,看看它们是否与我正在寻找的搜索短语相匹配,并返回字符串的长度,就像匹配评级一样。

for example I have part of the lyrics of one song here: 例如,我在这里有一首歌的部分歌词:

 "you gave up the love you got & that is that
 she loves me now she loves you not and that where its at"

I am using this regexp to find a match: 我正在使用此正则表达式来查找匹配项:

 (?mi)(\bShe\b).*(\bloves\b).*(\byou\b)

however it captures this group 但它抓住了这个群体

 "she loves me now she loves you"

I want to capture the smallest group possible which would just be "she loves you" 我想捕捉最小的可能只是“她爱你”的群体

How can I make this capture the smallest group possible? 我怎样才能捕获最小的组?

Some of my code below, I take in the phrase and split it into an array and then check to make sure the lyrics contains the word otherwise we can bail out. 下面我的一些代码,我接受这个短语并将其拆分成一个数组,然后检查以确保歌词包含该单词,否则我们可以拯救。 I then build a string that will become the regex 然后我构建一个将成为正则表达式的字符串

 static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Split the phrase up into separate words
    String[] phrase = lyricsPhrase.split("[^a-zA-Z]+");

    //Start to build the regex
    StringBuilder regex = new StringBuilder("(?im)"+"(\\" + "b" + phrase[0] + "\\b)");

    //loop through each word in the phrase
    for(int i = 1; i < phrase.length; i++){

        //Check to see if this word exists in the lyrics first
        if(lyrics.contains(phrase[i])){

            //add this to the regex we will search for
            regex.append(".*(\\b" + phrase[i] + "\\b)");

        }else{
            //if the song isn't found return the rank of 
            //-1 this means song doesn't contain phrase
            return -1;
        }

    }

    //Create the pattern
    Pattern p = Pattern.compile(regex.toString());
    Matcher m = p.matcher(lyrics);


    //Check to see if it can find a match
    if(m.find()){

        //Store this match in a string
        String match = m.group();
(\bShe\b)(?:(?!\b(?:she|loves|you)\b).)*(\bloves\b)(?:(?!\b(?:she|loves|you)\b).)*(\byou\b)

You can make use of lookahead here.See demo. 你可以在这里使用lookahead参见演示。

https://regex101.com/r/hE4jH0/11 https://regex101.com/r/hE4jH0/11

For java,use 对于java,请使用

(\\bShe\\b)(?:(?!\\b(?:she|loves|you)\\b).)*(\\bloves\\b)(?:(?!\\b(?:she|loves|you)\\b).)*(\\byou\\b)

Java's regex matcher only works in the forward direction. Java的正则表达式匹配器仅适用于前进方向。 What you would need to do is iterate over the set of all matches found and choose the shortest one. 您需要做的是迭代找到的所有匹配集并选择最短的匹配。

在这里你需要使用负向前瞻,

Pattern.compile("\\bShe\\b(?:(?!\\bshe\\b).)*?\\bloves\\b(?:(?!\\b(?:you|loves)\\b).)*\\byou\\b");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM