简体   繁体   中英

Racket pattern matching of lists

I am trying to do pattern matching with lists, but for some reason I get an unexpected match when I do the following:

> (define code '(h1 ((id an-id-here)) Some text here))
> (define code-match-expr '(pre ([class brush: python]) ...))
> (match code
    [code-match-expr #t]
    [_ #f])
#t

Question: Why does code match code-match-expr ?

Practical use-case

I tried this in the Racket REPL, because I actually want to solve another practical problem: using Pollen's pygments wrapping functions to highlight code, which will be output as HTML later on. For this purpose I wrote the following code, where the problem occurs:

(define (read-post-from-file path)
  (Post-from-content (replace-code-xexprs (parse-markdown path))))

(define (replace-code-xexprs list-of-xexprs)
  ;; define known languages
  (define KNOWN-LANGUAGE-SYMBOLS
    (list 'python
          'racket
          'html
          'css
          'javascript
          'erlang
          'rust))
  ;; check if it matches for a single language's match expression
  ;; if it mathces any language, return that language's name as a symbol
  (define (get-matching-language an-xexpr)
    (define (matches-lang-match-expr? an-xexpr lang-symbol)
      (display "XEXPR:") (displayln an-xexpr)
      (match an-xexpr
        [`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
        [`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
        [_ #f]))

    (ormap (lambda (lang-symbol)
             ;; (display "trying to match ")
             ;; (display an-xexpr)
             ;; (display " against ")
             ;; (displayln lang-symbol)
             (matches-lang-match-expr? an-xexpr lang-symbol))
           KNOWN-LANGUAGE-SYMBOLS))

  ;; replace code in an xexpr with highlightable code
  ;; TODO: What happens if the code is in a lower level of the xexpr?
  (define (replace-code-in-single-xexpr an-xexpr)
    (let ([matching-language (get-matching-language an-xexpr)])
      (cond [matching-language (code-highlight an-xexpr matching-language)]
            [else an-xexpr])))

  ;; apply the check to all xexpr
  (map replace-code-in-single-xexpr list-of-xexprs))

(define (code-highlight language code)
  (highlight language code))

In this example I am parsing a markdown file which has the following content:

# Code Demo

```python
def hello():
    print("Hello World!")
```

And I get the following xexpr s:

1.

(h1 ((id code-demo)) Code Demo)

2.

(pre ((class brush: python)) (code () def hello():
    print("Hello World!")))

However, none of those match for some reason.

match is syntax and does not evaluate the pattern. Since code-match-expr is a symbol it will bind the whole expression (result of evaluating code ) to the variable code-match-expr and evaluate the rest of the expressions as the pattern matches. The result will always be #t .

Notice that the second pattern, the symbol _ , is the same pattern . It also matches the whole expression, but _ is special in the way that it does not get bound like code-match-expr does.

It's important that your defined variable code-match-expr is never used, but since the match binds a variable with the same name your original binding will be shadowed in the consequent of the match .

Code that works as you intended might look like:

(define (test code)
  (match code 
    [`(pre ([class brush: python]) ,more ...) #t]
    [_ #f]))

(test '(h1 ((id an-id-here)) Some text here))
; ==> #f

(test '(pre ((class brush: python))))
; ==> #t

(test '(pre ((class brush: python)) a b c))
; ==> #t

As you see the pattern ,more ... means zero or more and what kind of brackets is ignored since in Racket [] is the same as () and {} .

EDIT

You still got it a little backwards. In this code:

(define (matches-lang-match-expr? an-xexpr lang-symbol)
  (display "XEXPR:") (displayln an-xexpr)
  (match an-xexpr
    [`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
    [`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
    [_ #f]))

When a pattern is macthed, since lang-symbol is unquoted it will match anything atomic and be bound to that as a variable in that clause. It will have nothing to do with the bound variable by the same name as a match does not use variables, it creates them. You return the variable. Thus:

(matches-lang-match-expr? '(pre ([class brush: jiffy]) bla bla bla) 'ignored-argument)
; ==> jiffy

Here is something that does what you want:

 (define (get-matching-language an-xexpr)
    (define (get-language an-xexpr)
      (match an-xexpr
        [`(pre ([class brush: ,lang-symbol]) (code () ,more ...)) lang-symbol]
        [`(pre ([class brush: ,lang-symbol]) ,more ...) lang-symbol]
        [_ #f]))
    (let* ((matched-lang-symbol (get-language an-xexpr))
           (in-known-languages (memq matched-lang-symbol KNOWN-LANGUAGE-SYMBOLS)))
      (and in-known-languages (car in-known-languages))))

Again.. match abuses quasiquote to something completely different than creating list structure. It uses them to match literals and capture the unqoted symbols as variables.

Make sure you're clear what it is you are matching. In Racket x-expressions, attribute names are symbols but the values are strings. So the expression you're matching would be something like (pre ([class "brush: js"])) ___) -- not (pre ([class brush: js]) ___) .

To match that string and extract the part after "brush: " , you could use a pregexp match pattern. Here is a snippet that Frog uses to extract the language to give to Pygments :

(for/list ([x xs])
  (match x
    [(or `(pre ([class ,brush]) (code () ,(? string? texts) ...))
         `(pre ([class ,brush]) ,(? string? texts) ...))
     (match brush
       [(pregexp "\\s*brush:\\s*(.+?)\\s*$" (list _ lang))
        `(div ([class ,(str "brush: " lang)])
              ,@(pygmentize (apply string-append texts) lang
                            #:python-executable python-executable
                            #:line-numbers? line-numbers?
                            #:css-class css-class))]
       [_ `(pre ,@texts)])]
    [x x])))

(Here pygmentize is a function defined in other Frog source code; it's a wrapper around running Pygments as a separate process and piping text between it. But you could substitute another way of using Pygments or any other syntax highlighter. That's N/A for your question about match . I mention it just so that doesn't become a distraction and another embedded question. :))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM