简体   繁体   中英

Java Regex to Match words + spaces

I am trying to construct this simple regex to match words + whitespace in Java, but I got confused trying to work it out. There are a lot of similar examples on this site, but the answers mostly give out the regex itself without explaining how it is constructed.

What I'm looking for is the Line of Thought behind forming the regular expression.

Sample Input String:

String Tweet = "\\"Whole Lotta Love\\" - Led Zeppelin";

which when printed is: "Whole Lotta Love" - Led Zeppelin

Problem Statement:

I want to find out if a String has a quotation in it. In the above sample string, Whole Lotta Love is the quotation.

What I've tried:

My first approach was to match anything between two double quotes, so I came up with the following regex:

"\\"(\\\\w+\\")" and "\\"(^\\")"

But this approach only works if there are no spaces between the two double quotes, like:

"Whole" Lotta Love

So I tried to modify my regex to match spaces, and this is where I got lost.

I tried the following, but they don't match

"\\"(\\\\w+?\\\\s+\\")" , "\\"(\\\\w+)(\\\\s+)\\"" , "\\"(\\\\w+)?(\\\\s+)\\""

I would appreciate if someone could help me figure out how to constuct this.

You almost had it. Your regexes would match alphanumeric characters followed by spaces, like this:

"Whole "

but not any alphanumeric chars after that. zEro is almost right, but you probably want to use a capture like this:

"\\"([\\\\w\\\\s]+)\\""

This matches one or more [whitespace/alphanumeric] chars. Note that alphanumeric includes _ .

If you want to be more general, you could use

"\\"([^\\"]+)\\""

which will match everything besides double quotes. For instance, "Who's on first?" (including the quotes) would be matched by the second regex but not by the first, since it includes punctuation.

The simplest way would be to have a while loop looking for anything in between two quotes in your input, so you check for multiple quoted expressions.

My example here accepts anything in between two quotes. You can refine with only alphabetics and spaces.

String quotedTweet = "\"Whole Lotta Love\" - Led Zeppelin";
String unquotedTweet = "Whole Lotta Love from Led Zeppelin";
String multipleQuotes = "\"Whole Lotta Love\" - \"Led\" Zeppelin";
// commented Pattern for only alphabetics or spaces
// Pattern pattern = Pattern.compile("\"([\\p{Alpha}\\p{Space}]+?)\"");
Pattern pattern = Pattern.compile("\"(.+?)\"");
Matcher matcher = pattern.matcher(quotedTweet);
while (matcher.find()) {
    // will find "Whole Lotta Love"
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(unquotedTweet);
while (matcher.find()) {
    // will find nothing
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(multipleQuotes);
while (matcher.find()) {
    // Will find "Whole Lotta Love" and "Led"
    System.out.println(matcher.group(1));
}

Edit this example and the commented variant will not prevent quoted whitespace, as in " " . Let me know if that's a requirement - the Pattern would be a bit more complicated in that case.

Output:

Whole Lotta Love
Whole Lotta Love
Led

You can use this:

\"(?>\\w+ *)+\"

or a character class as zEro suggests it.

[\w\s]+

we can use this as we need to separate sentences. For example, if we need to grab sentence from "hi I am Sandun" . Then we can use "+[\\w\\s]+" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM