简体   繁体   中英

Matching using regular expressions in Java?

I wish to find whole words in a text string. The words in the string are separated by spaces and new lines, so I used these two characters to find the beginning and ending of each word. When the pattern is "\\s" or "\\n", the program correctly finds the indices, and it does not when matching both characters. How can I fix this program?

import java.util.*;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class convertText{

    public static String findText(String text){

        String r = text.trim();

        // System.out.println(r);

        Pattern pattern = Pattern.compile("\\s+ | \\n");

        Matcher matcher = pattern.matcher(text);

    while (matcher.find()) {
        // System.out.println(matcher.start());
        System.out.println(text.substring(matcher.start()+1));
    }

        return text;
    }

    public static void main(String[] args) {
        // String test = " hi \n ok this. "; 
        String test = " hi ok this. "; 
        // System.out.println(test.substring(7));
        // System.out.println(test);
        findText(test);
    }


}

You can use [^\\\\s]+ to search for any character that isn't a newline or a space (aka words) and print the groups:

Pattern pattern = Pattern.compile("[^\\s]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
    System.out.println(matcher.group());
}

[^\\\\s]+ can be broken down into:

  • \\\\s matches any whitespace character, this includes regular spaces as well as newlines (so we don't need to specify \\\\n separately)
  • [ and ] which define a character set . this will match any character inside the brackets
  • ^ means "not", as the first character inside the character set inverts the match and only matches characters not in the set (anything but spaces and newlines in this case).
  • + matches one or more of the previous token, in this case the previous token is the character expression matching non whitespace characters

You can do it using java 8 Stream API as follow

String test = " hi ok this. ";
Pattern.compile("\\W+").splitAsStream(test.trim())
            .forEach(System.out::println);

Output:

hi
ok
this

If you want to match all words in a text string you can use:

(?i)[az]+ java escaped: "(?i)[az]+"

(?i) ... Turn on case insensitive match.
[az]+ ... Match any letter from az as many times as possible.

or you could use:

\\w+ ... Matches ASCII letter , digit and underscore . As many times as possible.


    try {
        String subjectString = " hi ok this. ";
        Pattern regex = Pattern.compile("(?i)[a-z]+", Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(subjectString);
        while (regexMatcher.find()) {
            String word = regexMatcher.group();
            int start_pos = regexMatcher.start();
            int end_pos = regexMatcher.end();
            JOptionPane.showMessageDialog(null, ""+word+ " found from pos: "+start_pos+" to "+end_pos);
        }
    } catch (PatternSyntaxException ex) {

    }

\\s doesn't match a single space (only). It matches ASCII space , tab , line feed , carriage return , vertical tab and form feed . So you would only need \\s+ to match all kinds of white space character.

只需用空格字符集分割字符串:

String[] words = yourString.split("\\s+");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM