使用Perl Regex解析語法樹

Question

也許正則表達式不是解析此問題的最佳方法，請告訴我是否不是。 無論如何，這是語法樹的一些示例：

(S (CC and))  
(SBARTMP (IN once) (NP otherstuff))   
(S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))

無論如何，我想做的是拔出連接詞（然后是一次，依此類推）及其對應的頭（CC，IN，CC），我已經知道每個語法樹，因此它可以充當錨，並且我還需要檢索其父級（第一個是S，第二個是SBARTMP，第三個是S），以及是否有兄弟姐妹（在第一個兄弟姐妹中，在第二個左側兄弟姐妹中，還有第三個兄弟姐妹）左側和右側同級）。 比父母更高的東西都不包括在內

my $pos = "(\\\w|-)*";  
my $sibling = qr{\s*(\\((?:(?>[^()]+)|(?1))*\\))\s*};  
my $connective = "once";  
my $re = qr{(\(\w*\s*$sibling*\s*\\(IN\s$connective\\)\s*$sibling*\s*\))};

此代碼適用於以下情況：

my $test1 = "(X (SBAR-TMP (IN once) (S sdf) (S sdf)))";  
my $test2 = "(X (SBAR-TMP (IN once))";  
my $test3 = "(X (SBAR-TMP (IN once) (X as))";  
my $test4 = "(X (SBAR-TMP (X adsf) (IN once))";

它將把X扔掉，並保留其他所有內容，但是，一旦兄弟姐妹中嵌入了某些東西，則它不匹配，因為正則表達式不會更深入。

my $test = "(X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))";

我不確定該如何解決。 我是Perl擴展模式的新手，剛剛開始學習它。 為了澄清一下正則表達式的功能：它在兩個括號內查找連接詞和大寫字母/-組合，在兩個括號內尋找具有相同格式的完整父級，然后應查找任意數量的同級他們所有的括號都配對了。

Answer 1

要僅獲取與錨定連接詞最接近的“父”，可以將其作為具有FAIL的遞歸父或直接進行。 （由於某種原因，我無法編輯其他帖子，必須刪除Cookie）。

use strict;
use warnings;

my $connective = qr/ \((?:IN|CC)\s(?:once|and|then)\)/x;
my $sibling = qr/
  \s*
  ( 
     (?! $connective )
     \(
        (?:
            (?> (?: [^()]+ ) )
          | (?-1)
        )*
     \)
  )
  \s*
 /x;

my $regex1 = qr/
      \( ( [\w-]+ \s* $sibling* \s* $connective \s* $sibling* ) \) #1
 /x;

my $regex2 = qr/
   ( #1
     \( \s*
        (  #2
           [\w-]+ \s*
           (?>   $sibling* \s* $connective (?(R)(*FAIL)) \s* $sibling*
               | (?1)
           )
        )
        \s*
     \)
   )
 /x;


my $sample = qq/
 (X (SBAR-TMP (IN once) (S sdf) (S sdf)))
 (X (SBAR-TMP (IN once))
 (X (SBAR-TMP (IN once) (X as))
 (X (SBAR-TMP (X adsf) (IN once))
 (X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))
 (S (CC and))  
 (SBARTMP (IN once) (NP otherstuff))   
 (S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))
/;

while ($sample =~ /$regex1/xg) {
    print "Found:   $1\n";
}
print '-' x 20, "\n";

while ($sample =~ /$regex2/xg) {
    print "Found:   $2\n";
}

__END__

Answer 2

你為什么要放棄這個，你幾乎做到了。 嘗試這個：

use strict;
use warnings;

 my $connective = qr/(?: \((?:IN|CC)\s(?:once|and|then)\) )/x;
 my $sibling = qr/
  \s*
  ( 
     (?!$connect)
     \(
        (?:
            (?> (?: [^()]+ ) )
          | (?-1)
        )*
     \)
  )
  \s*
 /x;

 my $regex = qr/
   ( #1
     \(
        \s* [\w-]+ \s*
        (?>   $sibling* \s* $connective \s* $sibling*
            | (?1)
        )
      \s*
     \)
   )
 /x;


my @tests = (
  '(X (SBAR-TMP (IN once) (S sdf) (S sdf)))',  
  '(X (SBAR-TMP (IN once))',
  '(X (SBAR-TMP (IN once) (X as))',
  '(X (SBAR-TMP (X adsf) (IN once))',
);

for my $sample (@tests)
{
    while ($sample =~ /$regex/xg) {
         print "Found:   $1\n";
    }
}

my $another =<<EOS;
(S (CC and))  
(SBARTMP (IN once) (NP otherstuff))   
(S
  (S
    (NP blah
      (VP blah)
    )
    (CC then)
    (NP blah
      (VP blah
        (PP blah)
      )
    )
  )
)
EOS

print "\n---------\n";
    while ($another =~ /$regex/xg) {
         print "\nFound:\n$1\n";
    }

結束

Answer 3

這也應該工作

use strict;
use warnings;

my $connective = qr/(?: \((?:IN|CC)\s(?:once|and|then)\) )/x;
my $sibling = qr/
  (?: \s*
  ( 
     (?!$connective)
     \(
        (?:
            (?> (?: [^()]+ ) )
          | (?-1)
        )*
     \)
  )
  \s* )
 /x;

my $regex = qr/
   ( #1
     \( \s*
        (  #2
           [\w-]+ \s*
           (?>   $sibling* \s* $connective (?(R)(*FAIL)) \s* $sibling*
               | (?1)
           )
        )
        \s*
     \)
   )
 /x;


my @tests = (
  '(X (SBAR-TMP (IN once) (S sdf) (S sdf)))',  
  '(X (SBAR-TMP (IN once))',
  '(X (SBAR-TMP (IN once) (X as))',
  '(X (SBAR-TMP (X adsf) (IN once))',
  '(X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))',    
);

for my $sample (@tests)
{
    while ($sample =~ /$regex/xg) {
        print "Found:   $2\n";
    }
}

my $another = "
(S (CC and))  
(SBARTMP (IN once) (NP otherstuff))   
(S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))
";

print "\n---------\n";
while ($another =~ /$regex/xg) {
    print "\nFound:\n$2\n";
}

__END__

使用Perl Regex解析語法樹

問題描述

3 個解決方案

解決方案1
1 已采納

解決方案2
0

解決方案3
0

使用Perl Regex解析語法樹

問題描述

3 個解決方案

解決方案1 1 已采納

解決方案2 0

解決方案3 0

解決方案1
1 已采納

解決方案2
0

解決方案3
0