简体   繁体   English

正则表达式查找以特定字符开头的单词

[英]Regex to find words that start with a specific character

I am trying to find words starts with a specific character like: 我试图找到一个特定字符开头的单词,如:

Lorem ipsum #text Second lorem ipsum. Lorem ipsum #text Second lorem ipsum。 How #are You. 你好吗。 It's ok. 没关系。 Done. 完成。 Something #else now. 现在#else。

I need to get all words starts with "#". 我需要用“#”开头所有单词。 so my expected results are #text, #are, #else 所以我的预期结果是#text,#are,#else

Any ideas? 有任何想法吗?

Search for: 搜索:

  • something that is not a word character then 那些不是单词字符的东西
  • #
  • some word characters 一些单词字符

So try this: 试试这个:

/(?<!\w)#\w+/

Or in C# it would look like this: 或者在C#中它看起来像这样:

string s = "Lorem ipsum #text Second lorem ipsum. How #are You. It's ok. Done. Something #else now.";
foreach (Match match in Regex.Matches(s, @"(?<!\w)#\w+"))
{
    Console.WriteLine(match.Value);
}

Output: 输出:

#text
#are
#else

试试这个#(\\S+)\\s?

Match a word starting with # after a white space or the beginning of a line. 在空格或行的开头后匹配以#开头的单词。 The last word boundary in not necessary depending on your usage. 根据您的使用情况,不需要最后一个单词边界。

/(?:^|\s)\#(\w+)\b/

The parentheses will capture your word in a group. 括号将在组中捕获您的单词。 Now, it depends on the language how you apply this regex. 现在,它取决于您如何应用此正则表达式的语言。

The (?:...) is a non-capturing group. (?:...)是非捕获组。

To accommodate different languages I have this (PCRE/PHP): 为了适应不同的语言,我有这个(PCRE / PHP):

'~(?<!\p{Latin})#(\p{Latin}+)~u'

or 要么

$language = 'ex. get form value';
'~(?<!\p{' . $language . '})#(\p{' . $language . '}+)~u'

or to cycle through multiple scripts 或循环使用多个脚本

$languages = $languageArray;

$replacePattern = [];

foreach ($languages as $language) {

  $replacePattern[] = '~(?<!\p{' . $language . '})#(\p{' . $language . '}+)~u';

}

$replacement = '<html>$1</html>';

$replaceText = preg_replace($replacePattern, $replacement, $text);

\\w works great, but as far as I've seen is only for Latin script. \\w效果很好,但据我所见,仅适用于拉丁文字。

Switch Latin for Cyrillic or Phoenician in the above example. 在上面的例子中切换LatinCyrillicPhoenician

The above example does not work for 'RTL' scripts. 上面的示例不适用于'RTL'脚本。

Code below should solve the case. 以下代码应解决此案。

  • /\\$(\\w)+/g Searches for words that starts with $ /\\$(\\w)+/g搜索以$开头的单词
  • /#(\\w)+/g Searches for words that starts with # /#(\\w)+/g搜索以#开头的单词

The answer /(?<!\\w)#\\w+/ given by Mark Bayers throws a warning like below on RegExr.com website Mark RegExr.com给出的答案/(?<!\\w)#\\w+/RegExr.com网站上发出如下警告

"(?<!" The "negative lookbehind" feature may not be supported in all browsers.

the warning can be fixed by changing it to (?!\\w)@\\w+ by removing > 可以通过删除>将警告更改为(?!\\w)@\\w+来修复警告

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM