简体   繁体   中英

How to find the word with dot using regex in Java?

I am a new to Java. I want to search for a string in text file. Suppose the file contains:

Hi, I am learning Java.

I am using this below pattern to search through every exact word.

Pattern p = Pattern.compile("\\b"+search string+"\\b", Pattern.CASE_INSENSITIVE);

It works fine but it doesn't find "java." How to find both patterns. ie with boundary symbols and with "." at end of the string. Does anyone have any ideas on how I can solve this problem?

You should parse your search string in order to change the dot . into a RegEx dot: \\\\. . Note that a single dot is a metacharacter in Regular Expressions and means any character. For example, you can replace all the dots in your String for \\\\.

If you don't want to do all that job, then just send java\\\\. as your search string

More info:

Code example:

public static void main(String[] args) {
    String fileContent = "Hi i am learning java.";
    String searchString = "java";
    Pattern p = Pattern.compile(searchString);
    Matcher m = p.matcher(fileContent );
    while(m.find()) {
        System.out.println(m.start() + " " + m.group());
    }
}

It would print: 17 java

public static void main(String[] args) {
    String fileContent = "Hi i am learning java.";
    String searchString = "java\\.";
    Pattern p = Pattern.compile(searchString);
    Matcher m = p.matcher(fileContent );
    while(m.find()) {
        System.out.println(m.start() + " " + m.group());
    }
}

It would print: 17 java. (note the dot in the end)

EDIT: As a very basic solution, since the only problem you have is with the dot, you can replace all the dots in your string with \\\\.

public static void main(String[] args) {
    String fileContent = "Hi i am learning java.";
    String searchString = "java.";
    //this will do the trick even if the "searchString" doesn't contain a dot inside
    searchString = searchString.replaceAll("\\.", "\\.");
    Pattern p = Pattern.compile(searchString);
    Matcher m = p.matcher(fileContent );
    while(m.find()) {
        System.out.println(m.start() + " " + m.group());
    }
}
"\\b" + searchstring + "(?:\\.|\\b)"

如果要规定点必须后跟非单词字符或字符串的结尾,则可以添加正向前瞻

"\\b" + searchstring + "(?:\\.(?=\\W|$)|\\b)"
Pattern p = Pattern.compile(".*\\W*" + searchWord + "\\W*.*", Pattern.CASE_INSENSITIVE);

To be absolutely sure, the above says "find me a bit of text that starts with 0 or more characters, followed by 0 or more non-word characters specifically (\\W* - the word boundary) followed by the search word, followed by the next word boundary followed by anything else".

This will caters for situations where the search word is at the beginning of the file, at the very end, or between punctuation eg: "hi,I am learning,java.".

Hope this helps...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM