简体   繁体   English

Java正则表达式匹配单词+空格

[英]Java Regex to Match words + spaces

I am trying to construct this simple regex to match words + whitespace in Java, but I got confused trying to work it out.我正在尝试构建这个简单的正则表达式来匹配 Java 中的单词 + 空格,但是我在尝试解决这个问题时感到困惑。 There are a lot of similar examples on this site, but the answers mostly give out the regex itself without explaining how it is constructed.这个站点上有很多类似的例子,但答案大多给出了正则表达式本身,而没有解释它是如何构建的。

What I'm looking for is the Line of Thought behind forming the regular expression.我正在寻找的是形成正则表达式背后的思路。

Sample Input String:示例输入字符串:

String Tweet = "\\"Whole Lotta Love\\" - Led Zeppelin";

which when printed is: "Whole Lotta Love" - Led Zeppelin打印出来时是: "Whole Lotta Love" - Led Zeppelin

Problem Statement:问题陈述:

I want to find out if a String has a quotation in it.我想知道一个字符串中是否有引号。 In the above sample string, Whole Lotta Love is the quotation.在上面的示例字符串中, Whole Lotta Love是引用。

What I've tried:我试过的:

My first approach was to match anything between two double quotes, so I came up with the following regex:我的第一种方法是匹配两个双引号之间的任何内容,所以我想出了以下正则表达式:

"\\"(\\\\w+\\")" and "\\"(^\\")" "\\"(\\\\w+\\")""\\"(^\\")"

But this approach only works if there are no spaces between the two double quotes, like:但是这种方法仅适用于两个双引号之间没有空格的情况,例如:

"Whole" Lotta Love

So I tried to modify my regex to match spaces, and this is where I got lost.所以我试图修改我的正则表达式以匹配空格,这就是我迷路的地方。

I tried the following, but they don't match我尝试了以下,但它们不匹配

"\\"(\\\\w+?\\\\s+\\")" , "\\"(\\\\w+)(\\\\s+)\\"" , "\\"(\\\\w+)?(\\\\s+)\\"" "\\"(\\\\w+?\\\\s+\\")" , "\\"(\\\\w+)(\\\\s+)\\"" , "\\"(\\\\w+)?(\\\\s+)\\""

I would appreciate if someone could help me figure out how to constuct this.如果有人能帮我弄清楚如何构建这个,我将不胜感激。

You almost had it.你几乎拥有它。 Your regexes would match alphanumeric characters followed by spaces, like this:您的正则表达式将匹配字母数字字符后跟空格,如下所示:

"Whole "

but not any alphanumeric chars after that.但之后没有任何字母数字字符。 zEro is almost right, but you probably want to use a capture like this: zEro 几乎是正确的,但您可能想要使用这样的捕获:

"\\"([\\\\w\\\\s]+)\\""

This matches one or more [whitespace/alphanumeric] chars.这匹配一个或多个 [whitespace/alphanumeric] 字符。 Note that alphanumeric includes _ .请注意,字母数字包括_

If you want to be more general, you could use如果你想更通用,你可以使用

"\\"([^\\"]+)\\""

which will match everything besides double quotes.它将匹配除双引号之外的所有内容 For instance, "Who's on first?"例如,“谁先上?” (including the quotes) would be matched by the second regex but not by the first, since it includes punctuation. (包括引号)将与第二个正则表达式匹配,但不会与第一个匹配,因为它包含标点符号。

The simplest way would be to have a while loop looking for anything in between two quotes in your input, so you check for multiple quoted expressions.最简单的方法是使用while循环查找输入中两个引号之间的任何内容,以便检查多个带引号的表达式。

My example here accepts anything in between two quotes.我这里的例子接受两个引号之间的任何内容。 You can refine with only alphabetics and spaces.您可以仅使用字母和空格进行优化。

String quotedTweet = "\"Whole Lotta Love\" - Led Zeppelin";
String unquotedTweet = "Whole Lotta Love from Led Zeppelin";
String multipleQuotes = "\"Whole Lotta Love\" - \"Led\" Zeppelin";
// commented Pattern for only alphabetics or spaces
// Pattern pattern = Pattern.compile("\"([\\p{Alpha}\\p{Space}]+?)\"");
Pattern pattern = Pattern.compile("\"(.+?)\"");
Matcher matcher = pattern.matcher(quotedTweet);
while (matcher.find()) {
    // will find "Whole Lotta Love"
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(unquotedTweet);
while (matcher.find()) {
    // will find nothing
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(multipleQuotes);
while (matcher.find()) {
    // Will find "Whole Lotta Love" and "Led"
    System.out.println(matcher.group(1));
}

Edit this example and the commented variant will not prevent quoted whitespace, as in " " .编辑此示例,注释变体将不会阻止引用的空格,如" " Let me know if that's a requirement - the Pattern would be a bit more complicated in that case.让我知道这是否是一项要求 - 在这种情况下,模式会更复杂一些。

Output:输出:

Whole Lotta Love
Whole Lotta Love
Led

You can use this:你可以使用这个:

\"(?>\\w+ *)+\"

or a character class as zEro suggests it.或 zEro 建议的字符类。

[\w\s]+

we can use this as we need to separate sentences.我们可以使用它,因为我们需要分隔句子。 For example, if we need to grab sentence from "hi I am Sandun" .例如,如果我们需要从"hi I am Sandun"抓取句子。 Then we can use "+[\\w\\s]+" .然后我们可以使用"+[\\w\\s]+"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM