简体   繁体   English

Clojure中的简单模式匹配

[英]Simple pattern matching in Clojure

I have a string in Clojure and I'd like to name and extract various parts of a match. 我在Clojure中有一个字符串,我想命名并提取比赛的各个部分。 The standard way to do this is: 执行此操作的标准方法是:

(re-seq #"\d{3}-\d{4}" "My phone number is 000-1234")
;; returns ("000-1234")

However I want to be able to name and access just the matched parts. 但是,我希望能够命名和访问匹配的部分。

Here's an example: 这是一个例子:

(def mystring "Find sqrt of 6 and the square of 2")
(def patterns '(#"sqrt of \d" #"square of \d"))

When I match on mystring with my list of patterns, I'd like a result to be something like of {:sqrt 6, :root 2} . 当我将mystring与模式列表匹配时,我希望结果类似于{:sqrt 6, :root 2}

Update 更新资料

I found a 3rd party package called https://github.com/rufoa/named-re that supported named groups, but I was hoping there was a solution within a core library. 我找到了一个名为https://github.com/rufoa/named-re的第三方程序包,该程序包支持命名组,但我希望核心库中有解决方案。

you can do it using named groups of java's regular expressions. 您可以使用Java正则表达式的命名组来实现。 the problem is that there is no api to get all the groups' names, so you will have to get them from your regexp: 问题是没有任何API可以获取所有组的名称,因此您必须从正则表达式中获取它们:

(defn find-named [re s]
  (let [m (re-matcher re s)
        names (map second (re-seq #"\(\?<([\w\d]+)>" (str re)))]
    (when (.find m)
      (into {} (map (fn [name]
                      [(keyword name) (.group m name)])
                    names)))))

in repl: 代表:

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of 2")
{:sqrt "6", :root "2"}

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of fff")
nil

update: 更新:

the conversation led me to the thought, that you don't really need named groups here, but rather named patterns: 谈话使我想到,您实际上并不需要命名组,而是命名模式:

user> 
(defn get-named [patterns s]
  (into {} (for [[k ptrn] patterns]
             [k (second (re-find ptrn s))])))
#'user/get-named

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of 2")
{:sq "6", :rt "2"}

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of xxx")
{:sq "6", :rt nil}

You need to capture the pattern you want, eg: 您需要捕获所需的模式,例如:

(re-seq #"sqrt of (\d)" "Find sqrt of 6")

Or if you want the first group match: 或者,如果您希望第一组比赛:

(def matcher #"sqrt of (\d)" "Find sqrt of 6")
(re-find matcher)
(second (re-groups matcher))

See the docs for re-groups . 有关重新分组,请参阅文档。

As far as naming captured groups, I didn't look too carefully at the library you mentioned in the question but I would think the only practical difference is in assigning the capturing group a name rather than it just being referenced by its numeric left-to-right position (starting from 1) in the regex. 至于命名捕获组,我对问题中提到的库没有太仔细看,但是我认为唯一实际的区别是为捕获组分配了一个名称,而不是仅由其数字左至-在正则表达式中的-right位置(从1开始)。

Depending on what you intend to do with the 'named matches' you may also find it useful to simply destructure the matches and bind them to symbols. 根据您打算对“命名匹配项”进行的操作,您可能会发现简单地分解匹配项并将其绑定到符号也很有用。

For a single match: 对于单场比赛:

(if-let [[_ digit letter] (re-find #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => ["1" "x"]

For multiple matches: 对于多个比赛:

(for [[_ digit letter] (re-seq #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => (["1" "x"] ["2" "y"] ["3" "z"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM