[英]Regular Expression to find separate words?
這是RegEx向導的快速入門。 我需要一個能找到單詞組的正則表達式。 任何一組單詞。 例如,我希望它能在任何句子中找到前兩個單詞。
例如:“嗨,你好嗎?” -返回會是“嗨”
例如:“你好嗎?” -返回為“怎么樣”
嘗試這個:
^\w+\s+\w+
說明:一個或多個單詞字符,空格和一個或多個單詞字符一起。
正則表達式可用於解析語言。 正則表達式是一種更自然的工具。 收集單詞后,使用詞典查看它們是否實際上是特定語言的單詞。
前提是定義一個正則表達式,該表達式將拆分出%99.9個可能的單詞, 單詞是關鍵定義。
我假設C#將使用基於5.8 Perl的PCRE。
這是我對如何拆分單詞(擴展)的ascii定義:
regex = '[\\s[:punct:]]* (\\w (?: \\w | [[:punct:]](?=[\\w[:punct:]]) )* )
和unicode(必須為套件特定的編碼添加/減去更多):
regex = '[\\s\\pP]* ([\\pL\\pN_-] (?: [\\pL\\pN_-] | \\pP(?=[\\pL\\pN\\pP_-]) )* )'
要查找所有單詞,請將正則表達式字符串放入正則表達式中(我不知道C#):
@matches =~ /$regex/xg
/ xg是擴展的和全局修飾符。 請注意,正則表達式字符串中僅存在捕獲組1,因此不會捕獲中間文本。
僅查找第一兩個 :
@matches =~ /(?:$regex)(?:$regex)/x
下面是一個Perl示例。 無論如何,玩弄它。 干杯!
use strict;
use warnings;
binmode (STDOUT,':utf8');
# Unicode
my $regex = qr/ [\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* ) /x;
# Ascii
# my $regex = qr/ [\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* ) /x;
my $text = q(
I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included
);
print "\n**\n$text\n";
my @matches = $text =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
# =======================================
my $junk = q(
Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? a-b? -'a-?
);
print "\n\n**\n$junk\n";
# First 2 words
@matches = $junk =~ /(?:$regex)(?:$regex)/;
print "\nFirst 2 words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
# All words
@matches = $junk =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
輸出:
**
I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included
Total 25 words
--------------------
I
confirm
that
sufficient
information
and
detail
have
been
reported
in
this
technical
report
that
it's
scientifically
sound
and
that
appropriate
conclusion's
have
been
included
**
Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? ab? -'a-?
First 2 words
--------------------
Hi
there
Total 11 words
--------------------
Hi
there
A
écafé
and
Horse
d'oeuvre
hasn't
n
ab
a-
@ 魯本斯·法里亞斯 :
根據我的評論,這是我使用的代碼:
public int startAt = 0;
private void btnGrabWordPairs_Click(object sender, EventArgs e)
{
Regex regex = new Regex(@"\b\w+\s+\w+\b"); //Start at word boundary, find one or more word chars, one or more whitespaces, one or more chars, end at word boundary
if (startAt <= txtTest.Text.Length)
{
string match = regex.Match(txtArticle.Text, startAt).ToString();
MessageBox.Show(match);
startAt += match.Length; //update the starting position to the end of the last match
}
{
每次單擊該按鈕時,它都會很好地捕獲成對的單詞,依次遍歷txtTest TextBox中的文本,並順序查找對,直到到達字符串的末尾。
// @ sln :非常感謝您的詳細答復!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.