正則表達式找到單獨的單詞？

Question

這是RegEx向導的快速入門。 我需要一個能找到單詞組的正則表達式。 任何一組單詞。 例如，我希望它能在任何句子中找到前兩個單詞。

例如：“嗨，你好嗎？” -返回會是“嗨”

例如：“你好嗎？” -返回為“怎么樣”

Answer 1

嘗試這個：

^\w+\s+\w+

說明：一個或多個單詞字符，空格和一個或多個單詞字符一起。

Answer 2

正則表達式可用於解析語言。 正則表達式是一種更自然的工具。 收集單詞后，使用詞典查看它們是否實際上是特定語言的單詞。

前提是定義一個正則表達式，該表達式將拆分出％99.9個可能的單詞，單詞是關鍵定義。

我假設C＃將使用基於5.8 Perl的PCRE。
這是我對如何拆分單詞（擴展）的ascii定義：

regex = '[\\s[:punct:]]* (\\w (?: \\w | [[:punct:]](?=[\\w[:punct:]]) )* )

和unicode（必須為套件特定的編碼添加/減去更多）：

regex = '[\\s\\pP]* ([\\pL\\pN_-] (?: [\\pL\\pN_-] | \\pP(?=[\\pL\\pN\\pP_-]) )* )'

要查找所有單詞，請將正則表達式字符串放入正則表達式中（我不知道C＃）：

@matches =~ /$regex/xg

/ xg是擴展的和全局修飾符。 請注意，正則表達式字符串中僅存在捕獲組1，因此不會捕獲中間文本。

僅查找第一兩個 ：

@matches =~ /(?:$regex)(?:$regex)/x

下面是一個Perl示例。 無論如何，玩弄它。 干杯!

use strict;
use warnings;

binmode (STDOUT,':utf8');

# Unicode
my $regex = qr/ [\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* ) /x;

# Ascii
# my $regex = qr/ [\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* ) /x;


my $text = q(
  I confirm that sufficient information and detail have been
  reported in this technical report, that it's "scientifically" sound,
  and that appropriate conclusion's have been included
);
print "\n**\n$text\n"; 

my @matches = $text =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

# =======================================

my $junk = q(
Hi, there, A écafé and Horse d'oeuvre 
hasn't? 'n? '? a-b? -'a-? 
);
print "\n\n**\n$junk\n"; 

# First 2 words
@matches = $junk =~ /(?:$regex)(?:$regex)/;
print "\nFirst 2 words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

# All words
@matches = $junk =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

輸出：
**

I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included

Total 25 words
--------------------
I
confirm
that
sufficient
information
and
detail
have
been
reported
in
this
technical
report
that
it's
scientifically
sound
and
that
appropriate
conclusion's
have
been
included

**

Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? ab? -'a-?

First 2 words
--------------------
Hi
there

Total 11 words
--------------------
Hi
there
A
écafé
and
Horse
d'oeuvre
hasn't
n
ab
a-

Answer 3

@ 魯本斯·法里亞斯 ：

根據我的評論，這是我使用的代碼：

public int startAt = 0;

private void btnGrabWordPairs_Click(object sender, EventArgs e)
    {
        Regex regex = new Regex(@"\b\w+\s+\w+\b"); //Start at word boundary, find one or more word chars, one or more whitespaces, one or more chars, end at word boundary

        if (startAt <= txtTest.Text.Length)
        {
            string match = regex.Match(txtArticle.Text, startAt).ToString();
            MessageBox.Show(match);
            startAt += match.Length; //update the starting position to the end of the last match
        }
     {

每次單擊該按鈕時，它都會很好地捕獲成對的單詞，依次遍歷txtTest TextBox中的文本，並順序查找對，直到到達字符串的末尾。

// @ sln ：非常感謝您的詳細答復！

正則表達式找到單獨的單詞？

問題描述

3 個解決方案

解決方案1
4 已采納 2011-02-05 11:07:27

解決方案2
2

解決方案3
0 2011-02-09 12:59:17

正則表達式找到單獨的單詞？

問題描述

3 個解決方案

解決方案1 4 已采納 2011-02-05 11:07:27

解決方案2 2

解決方案3 0 2011-02-09 12:59:17

解決方案1
4 已采納 2011-02-05 11:07:27

解決方案2
2

解決方案3
0 2011-02-09 12:59:17