简体   繁体   English

匹配以特定单词开头的字符串的正则表达式

[英]Regular expression to match string starting with a specific word

How do I create a regular expression to match a word at the beginning of a string?如何创建正则表达式以匹配字符串开头的单词?

We are looking to match stop at the beginning of a string and anything can follow it.我们希望在字符串的开头匹配停止,任何内容都可以跟在它后面。

For example, the expression should match:例如,表达式应匹配:

stop
stop random
stopping

If you wish to match only lines beginning with stop use如果您只想匹配以 stop 开头的行,请使用

^stop

If you wish to match lines beginning with the word stop followed by a space如果您希望匹配以单词 stop 开头且后跟空格的行

^stop\s

Or, if you wish to match lines beginning with the word stop but followed by either a space or any other non word character you can use (your regex flavor permitting)或者,如果您希望匹配以单词 stop 开头但后跟空格或任何其他可以使用的非单词字符的行(您的正则表达式允许)

^stop\W

On the other hand, what follows matches a word at the beginning of a string on most regex flavors (in these flavors \\w matches the opposite of \\W)另一方面,接下来的内容在大多数正则表达式风格中匹配字符串开头的单词(在这些风格中 \\w 与 \\W 匹配)

^\w

If your flavor does not have the \\w shortcut, you can use如果您的风味没有 \\w 快捷方式,则可以使用

^[a-zA-Z0-9]+

Be wary that this second idiom will only match letters and numbers, no symbol whatsoever.请注意,第二个习语只会匹配字母和数字,而不会匹配任何符号。

Check your regex flavor manual to know what shortcuts are allowed and what exactly do they match (and how do they deal with Unicode.)检查您的正则表达式风格手册以了解允许使用哪些快捷方式以及它们究竟匹配什么(以及它们如何处理 Unicode。)

Try this:尝试这个:

/^stop.*$/

Explanation:解释:

  • / charachters delimit the regular expression (ie they are not part of the Regex per se) /字符分隔正则表达式(即它们本身不是正则表达式的一部分)
  • ^ means match at the beginning of the line ^表示匹配行首
  • . . followed by * means match any character (.), any number of times (*)后跟*表示匹配任意字符 (.),任意次数 (*)
  • $ means to the end of the line $表示到行尾

If you would like to enforce that stop be followed by a whitespace, you could modify the RegEx like so:如果您想强制停止后跟一个空格,您可以像这样修改 RegEx:

/^stop\s+.*$/
  • \\s means any whitespace character \\s表示任何空白字符
  • + following the \\s means there has to be at least one whitespace character following after the stop word +跟在\\s 之后意味着在停用词之后必须至少有一个空格字符

Note: Also keep in mind that the RegEx above requires that the stop word be followed by a space!注意:还要记住,上面的 RegEx 要求停止词后跟一个空格! So it wouldn't match a line that only contains: stop所以它不会匹配只包含: stop

If you want to match anything after a word stop an not only at the start of the line you may use : \\bstop.*\\b - word followed by line如果您想在单词停止后匹配任何内容,不仅可以在行首使用: \\bstop.*\\b -单词后跟行

单词直到字符串结尾

Or if you want to match the word in the string use \\bstop[a-zA-Z]* - only the words starting with stop或者,如果您想匹配字符串中的单词,请使用\\bstop[a-zA-Z]* -仅以 stop 开头的单词

仅以 stop 开头的单词

Or the start of lines with stop ^stop[a-zA-Z]* for the word only - first word only或者以停止^stop[a-zA-Z]*开头的行仅用于单词 - 仅第一个单词
The whole line ^stop.* - first line of the string only整行^stop.* -仅字符串的第一行

And if you want to match every string starting with stop including newlines use : /^stop.*/s - multiline string starting with stop如果你想匹配每个以 stop 开头的字符串,包括换行符,请使用:/ /^stop.*/s stop.*/ /^stop.*/s -以 stop 开头的多行字符串

Like @SharadHolani said.就像@SharadHolani 所说的那样。 This won't match every word beginning with " stop "这不会匹配以“停止”开头的每个单词

. . Only if it's at the beginning of a line like " stop going ".仅当它位于“停止运行”之类的行的开头时。 @Waxo gave the right answer: @Waxo 给出了正确答案:

This one is slightly better, if you want to match any word beginning with " stop " and containing nothing but letters from A to Z .如果你想匹配任何以“ stop ”开头并且只包含从A 到 Z 的字母的单词,这个稍微好一些。

\bstop[a-zA-Z]*\b

This would match all这将匹配所有

stop (1)停止(1)

stop random (2)随机停止(2)

stopping (3)停止(3)

want to stop (4)停下(4)

please stop (5)停止(5)

But

/^stop[a-zA-Z]*/

would only match (1) until (3), but not (4) & (5)只会匹配 (1) 直到 (3),而不匹配 (4) & (5)

/stop([a-zA-Z])+/

Will match any stop word (stop, stopped, stopping, etc)将匹配任何停止词(停止、停止、停止等)

However, if you just want to match "stop" at the start of a string但是,如果您只想匹配字符串开头的“stop”

/^stop/

will do :D会做:D

If you want to match anything that starts with "stop" including "stop going", "stop" and "stopping" use:如果要匹配以“stop”开头的任何内容,包括“stop going”、“stop”和“stopping”,请使用:

^stop

If you want to match the word stop followed by anything as in "stop going", "stop this", but not "stopped" and not "stopping" use:如果您想匹配单词stop 后跟任何内容,如“stop going”、“stop this”,但不是“stopped”而不是“stopping”,请使用:

^stop\W

If you want the word to start with "stop", you can use the following pattern.如果希望单词以“stop”开头,可以使用以下模式。 "^stop.*" “^停止。*”

This will match words starting with stop followed by anything.这将匹配以 stop 开头的单词,然后是任何内容。

I'd advise against a simple regular expression approach to this problem.我建议不要使用简单的正则表达式方法来解决这个问题。 There are too many words that are substrings of other unrelated words, and you'll probably drive yourself crazy trying to overadapt the simpler solutions already provided.有太多词是其他不相关词的子串,您可能会因为过度适应已经提供的更简单的解决方案而发疯。

You'll want at least a naive stemming algorithm (try the Porter stemmer; there's available, free code in most languages) to process text first.您至少需要一个简单的词干提取算法(尝试 Porter 词干分析器;大多数语言都有可用的免费代码)来首先处理文本。 Keep this processed text and the preprocessed text in two separate space-split arrays.将此处理过的文本和预处理过的文本保存在两个单独的空间分割数组中。 Make sure each non-alphabetical character also gets its own index in this array.确保每个非字母字符在这个数组中也有自己的索引。 Whatever list of words you're filtering, stem them also.无论您要过滤的单词列表是什么,也要对它们进行词干。

The next step would be to find the array indices which match to your list of stemmed 'stop' words.下一步是找到与您的词干“停止”词列表匹配的数组索引。 Remove those from the unprocessed array, and then rejoin on spaces.从未处理的数组中删除那些,然后重新加入空格。

This is only slightly more complicated, but will be much more reliable an approach.这只是稍微复杂一些,但将是一种更可靠的方法。 If you've got any doubts on the value of a more NLP-oriented approach, you might want to do some research into clbuttic mistakes .如果您对更加面向 NLP 的方法的价值有任何疑问,您可能想要对clbuttic 错误进行一些研究。

/^stop*$/i /^停止*$/i

i - incase, it is case sensitive i - incase,区分大小写

can you try this:你能试试这个吗:

https://regex101.com/r/P3qfKG/1

reg = /stop(\w+| [^ ]+|$)/gm reg = /stop(\w+| [^ ]+|$)/gm

it will select both stop and start with stop and next word;它将 select 以停止和下一个单词停止和开始;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM