簡體   English   中英

Scala正則表達式模式匹配

[英]Scala Regex Pattern Matching

我需要使用正則表達式來匹配Scala中的模式,並且我目前有一個正則表達式

InputPattern: scala.util.matching.Regex = put (.*) in (.*)

當我執行以下操作時:

scala> val InputPattern(verb, item, prep, obj) = "put a in b";
scala.MatchError: put a in b (of class java.lang.String)
... 33 elided 

我希望以輸入( put a in b)verb("put"), item(""), prep("in"), and obj("") verb("put"), item("a"), prep("in"), and obj("b")輸入的verb("put"), item("a"), prep("in"), and obj("b")結尾輸入的“ put in”的 verb("put"), item(""), prep("in"), and obj("")

謝謝

您可以為所有情況編寫一個正則表達式,但是我不確定它是否可讀和可維護。 我更喜歡簡單的方法:

val pattern1 = "(put) (.*) (in) (.*)".r
val pattern2 = "(put) (in)".r
def parse(text: String) = text match { 
  case pattern1(verb, item, prep, obj) => (verb, item, prep, obj); 
  case pattern2(verb, prep) => (verb, "", prep, "") 
}
scala> parse("put a in b")
res6: (String, String, String, String) = (put,a,in,b)

scala> parse("put in")
res7: (String, String, String, String) = (put,"",in,"")

還有一個想法:希望您知道自己在做什么! RegEx是Chomsky Type 3語法 ,自然語言要復雜得多。 如果需要自然語言解析器,則可以使用已有的解決方案,例如Stanford NLP解析器

這適用於您的特殊情況:

scala> val InputPattern = "(put) (.*?) ?(in) ?(.*?)".r
InputPattern: scala.util.matching.Regex = (put) (.*) ?(in) ?(.*)

scala> val InputPattern(verb, item, prep, obj) = "put a in b"
verb: String = put
item: String = a
prep: String = in
obj: String = b

scala> val InputPattern(verb, item, prep, obj) = "put in"
verb: String = put
item: String = ""
prep: String = in
obj: String = ""

putin這里也被分組捕獲以參與模式匹配。 我還使用了惰性正則表達式(.*?)來捕獲盡可能少的內容,您可以將其替換為(\\S*) ? 給您可選的空間來匹配“放入”(在putin之間有一個空格,在末尾沒有空格)。

但是請注意:

scala> val InputPattern(verb, item, prep, obj) = "put ainb"
verb: String = put
item: String = a
prep: String = in
obj: String = b

scala> val InputPattern(verb, item, prep, obj) = "put aininb"
verb: String = put
item: String = a
prep: String = in
obj: String = inb

scala> val InputPattern(verb, item, prep, obj) = "put ain"
verb: String = put
item: String = a
prep: String = in
obj: String = ""

如果您有簡單的命令解釋器,則可能會更好,否則,應單獨匹配特殊情況。

要處理一種簡單的(非自然的)語言,您還可以考慮使用StandardTokenParsers ,因為它們是上下文無關的( Chomsky類型2 ):

import scala.util.parsing.combinator.syntactical._

val p = new StandardTokenParsers {
   lexical.reserved ++= List("put", "in") 
   def p = "put" ~ opt(ident) ~ "in" ~ opt(ident)
}

scala> p.p(new p.lexical.Scanner("put a in b"))
warning: there was one feature warning; re-run with -feature for details
res13 = [1.11] parsed: (((put~Some(a))~in)~Some(b))

scala> p.p(new p.lexical.Scanner("put in"))
warning: there was one feature warning; re-run with -feature for details
res14 = [1.7] parsed: (((put~None)~in)~None)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM