简体   繁体   中英

Java Regex not working multiline

I got some kind help last night figuring out a regex to capture the smallest group possible. I need to take a string of lyrics and find a search phrase in it. The problem I am having is I can't get it to look multi line.

I have a text file with lyrics that I read in, this is just part of the song. (brackets aren't in text file I am just using them to show the group I am trying to capture.

 The first [time we fall in love. 
 Love can be exciting, it can be a bloody bore. 
 Love can be a pleasure or nothing but a chore.
 Love can be like a dull routine, 
 It can run you around until you're out of steam. 
 It can treat you well, it can treat you mean, 
 Love can mess you around, 
 Love can pick you up, it can bring you down]. 
 But they'll never know The feelings we show 

The phrase I am using a regex for is

 time can bring you down

I use a stringbuilder to create the string of lyrics, the lyrics then contain the \\n character. I tried doing a replaceAll to strip them but it still didn't work. If I go into the text file and just write one line saying time can bring you down, it works but if I write that into two lines it doesn't.

I tried using \\n in my regex but it ended up capturing most of the song because time is the second word. This is the regex that I currently am trying to use:

(?is)(\bTime\b)(?:(?!\n\b(?:time|can|bring|you|down)\b\n).)*(\bcan\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bbring\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\byou\b)(?:(?!\b(?:time|can|bring|you|down)\b).)*(\bdown\b)

I am trying to capture what is in the brackets above in the lyrics. Here is my method that I am using it takes in the lyrics and searchPhrase and returns the length of the string it found.

    static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Split the phrase up into separate words
    String[] phrase = lyricsPhrase.split("[^a-zA-Z]+");

    //Helper string for regex so we can get smallest grouping
    String regexHelper = lyricsPhrase.replaceAll(" ","|").toLowerCase();

    //Start to build the regex
    StringBuilder regex = new StringBuilder("(?im)"+"(\\" + "b" + phrase[0] + "\\b)");

    //loop through each word in the phrase
    for(int i = 1; i < phrase.length; i++){ 

            //add this to the regex we will search for
            regex.append("(?:(?!\\b(?:" + regexHelper + ")\\b).)*(\\b" + phrase[i] + "\\b)");   

    }

    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

I will continue to try and trying to figure it out, I think it's a matter of working \\n into the regex but not 100% sure. Thank you!

You are trying to search a combination of words within a string. That can be easily achieved by using word1.*?word2 as a Regex. Here there can be n characters between word one and word two. ? denotes Lazy matching . As few as possible.
But Here the problem is you are trying to search a pattern in multiple lines. When you use the . meta character it works in a single line. . is all the meta characters except a new line character.
You can easily overcome this by using (.|\\n)* rather than using .*

I have updated your code below.

public class Regexa2 {
 static int rankPhrase(String lyrics, String lyricsPhrase){
    //This takes in song lyrics and the phrase we are searching for

    //Start to build the regex
    String regex = lyricsPhrase.replaceAll(" ","(.|\\n)*?").toLowerCase();

    System.out.println(regex);
    //Create the pattern
    Pattern p = Pattern.compile(regex.toString(), Pattern.DOTALL);
    Matcher m = p.matcher(lyrics);

    //string for regex match found
    String regexMatch = "";
        while(m.find()){

            regexMatch = m.group();
            System.out.println(regexMatch);
    }

    return regexMatch.length();

}

public static void main(String[] args) {
    String lyrics = "The first time we fall in love. \n" + 
            "Love can be exciting, it can be a bloody bore. \n" + 
            "Love can be a pleasure or nothing but a chore.\n" + 
            "Love can be like a dull routine, \n" + 
            "It can run you around until you're out of steam. \n" + 
            "It can treat you well, it can treat you mean, \n" + 
            "Love can mess you around, \n" + 
            "Love can pick you up, it can bring you down. \n" + 
            "But they'll never know The feelings we show ";
    String phrase = "time can bring you down";
    Regexa2.rankPhrase(lyrics, phrase);
 }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM