简体   繁体   English

正则表达式匹配句子中的第一个单词

[英]Regex to match first word in sentence

I am looking for a regex that matches first word in a sentence excluding punctuation and white space. 我正在寻找一个正则表达式,它匹配句子中的第一个单词,但不包括标点符号和空格。 For example: "This" in "This is a sentence." 例如:“这是一个句子”中的“这”。 and "First" in "First, I would like to say \\"Hello!\\"" 和“首先,我想说的是“第一”!

This doesn't work: 这不起作用:

"""([A-Z].*?(?=^[A-Za-z]))""".r
(?:^|(?:[.!?]\s))(\w+)

Will match the first word in every sentence. 将匹配每个句子中的第一个单词。

http://rubular.com/r/rJtPbvUEwx http://rubular.com/r/rJtPbvUEwx

[a-z]+

This should be enough as it will get the first az characters (assuming case-insensitive). 这应该足够,因为它将获得第一个az字符(假设不区分大小写)。

In case it doesn't work, you could try [az]+\\b , or even ^[az]\\b , but the last one assumes that the string starts with the word. 万一它不起作用,您可以尝试[az]+\\b ,甚至^[az]\\b ,但是最后一个假设字符串以单词开头。

您可以使用以下正则表达式: ^[^\\s]+^[^ ]+

This is an old thread but people might need this like I did. 这是一个旧线程,但是人们可能像我一样需要它。 None of the above works if your sentence starts with one or more spaces. 如果您的句子以一个或多个空格开头,则以上方法均无效。 I did this to get the first (non empty) word in the sentence : 我这样做是为了获得句子中的第一个(非空)单词:

(?<=^[\s"']*)(\w+)

Explanation: 说明:

(?<=^[\\s"']*) positive lookbehind in order to look for the start of the string, followed by zero or more spaces or punctuation characters (you can add more between the brackets), but do not include it in the match. (?<=^[\\s"']*)正向后看,以查找字符串的开头,后跟零个或多个空格或标点字符(可以在方括号之间添加更多字符),但不要包括它在比赛中。
(\\w+) the actual match of the word, which will be returned (\\w+)单词的实际匹配,将返回

The following words in the sentence are not matched as they do not satisfy the lookbehind. 句子中的以下单词不匹配,因为它们不符合后面的条件。

You can use this regex: ^\\s*([a-zA-Z0-9]+) . 您可以使用此正则表达式: ^\\s*([a-zA-Z0-9]+)

The first word can be found at a captured group. 第一个单词可以在捕获的组中找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM