简体   繁体   English

从 Clojure 中的 STRING 解析命令行参数

[英]Parsing command-line arguments from a STRING in Clojure

I'm in a situation where I need to parse arguments from a string in the same way that they would be parsed if provided on the command-line to a Java/Clojure application.我处于一种情况,我需要以与在命令行上提供给 Java/Clojure 应用程序时解析字符串的方式相同的方式解析字符串中的参数。

For example, I need to turn "foo \\"bar baz\\" 'fooy barish' foo" into ("foo" "bar baz" "fooy barish" "foo") .例如,我需要将"foo \\"bar baz\\" 'fooy barish' foo"变成("foo" "bar baz" "fooy barish" "foo")

I'm curious if there is a way to use the parser that Java or Clojure uses to do this.我很好奇是否有办法使用 Java 或 Clojure 用来执行此操作的解析器。 I'm not opposed to using a regex, but I suck at regexes, and I'd fail hard if I tried to write one for this.我不反对使用正则表达式,但我很讨厌正则表达式,如果我试图为此编写一个,我会很失败。

Any ideas?有任何想法吗?

Updated with a new, even more convoluted version.更新了一个新的,更复杂的版本。 This is officially ridiculous;这在官方上是荒谬的; the next iteration will use a proper parser (or ccmonads and a little bit of Parsec-like logic on top of that).下一次迭代将使用适当的解析器(或 ccmonads 和一些类似 Parsec 的逻辑)。 See the revision history on this answer for the original.请参阅此答案的修订历史以获取原件。

This convoluted bunch of functions seems to do the trick (not at my DRYest with this one, sorry!):这一堆令人费解的函数似乎可以解决问题(不是我最干的,抱歉!):

(defn initial-state [input]
  {:expecting nil
   :blocks (mapcat #(str/split % #"(?<=\s)|(?=\s)")
                   (str/split input #"(?<=(?:'|\"|\\))|(?=(?:'|\"|\\))"))
   :arg-blocks []})

(defn arg-parser-step [s]
  (if-let [bs (seq (:blocks s))]
    (if-let [d (:expecting s)]
      (loop [bs bs]
        (cond (= (first bs) d)
              [nil (-> s
                       (assoc-in [:expecting] nil)
                       (update-in [:blocks] next))]
              (= (first bs) "\\")
              [nil (-> s
                       (update-in [:blocks] nnext)
                       (update-in [:arg-blocks]
                                  #(conj (pop %)
                                         (conj (peek %) (second bs)))))]
              :else
              [nil (-> s
                       (update-in [:blocks] next)
                       (update-in [:arg-blocks]
                                  #(conj (pop %) (conj (peek %) (first bs)))))]))
      (cond (#{"\"" "'"} (first bs))
            [nil (-> s
                     (assoc-in [:expecting] (first bs))
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj []))]
            (str/blank? (first bs))
            [nil (-> s (update-in [:blocks] next))]
            :else
            [nil (-> s
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj [(.trim (first bs))]))]))
    [(->> (:arg-blocks s)
          (map (partial apply str)))
     nil]))

(defn split-args [input]
  (loop [s (initial-state input)]
    (let [[result new-s] (arg-parser-step s)]
      (if result result (recur new-s)))))

Somewhat encouragingly, the following yields true :有点令人鼓舞的是,以下结果为true

(= (split-args "asdf 'asdf \" asdf' \"asdf ' asdf\" asdf")
   '("asdf" "asdf \" asdf" "asdf ' asdf" "asdf"))

So does this:这样做也是如此:

(= (split-args "asdf asdf '  asdf \" asdf ' \" foo bar ' baz \" \" foo bar \\\" baz \"")
   '("asdf" "asdf" "  asdf \" asdf " " foo bar ' baz " " foo bar \" baz "))

Hopefully this should trim regular arguments, but not ones surrounded with quotes, handle double and single quotes, including quoted double quotes inside unquoted double quotes (note that it currently treats quoted single quotes inside unquoted single quotes in the same way, which is apparently at variance with the *nix shell way... argh) etc. Note that it's basically a computation in an ad-hoc state monad, just written in a particularly ugly way and in a dire need of DRYing up.希望这应该修剪常规参数,而不是用引号包围的参数,处理双引号和单引号,包括未引号双引号内的双引号(请注意,它目前以相同的方式处理未引号单引号内的带引号单引号,这显然是在与 *nix shell 方式的差异...... argh) 等等。请注意,它基本上是一个临时状态 monad 中的计算,只是以一种特别丑陋的方式编写,并且迫切需要干燥。 :-P :-P

This bugged me, so I got it working in ANTLR.这让我很烦恼,所以我让它在 ANTLR 中工作。 The grammar below should give you an idea of how to do it.下面的语法应该让你知道如何去做。 It includes rudimentary support for backslash escape sequences.它包括对反斜杠转义序列的基本支持。

Getting ANTLR working in Clojure is too much to write in this text box.让 ANTLR 在 Clojure 中工作太多了,无法在此文本框中写入。 I wrote a blog entry about it though.我写了一篇关于它的博客文章

grammar Cmd;

options {
    output=AST;
    ASTLabelType=CommonTree;
}

tokens {
    DQ = '"';
    SQ = '\'';
    BS = '\\';
}

@lexer::members {
    String strip(String s) {
        return s.substring(1, s.length() - 1);
    }
}

args: arg (sep! arg)* ;
arg : BAREARG
    | DQARG 
    | SQARG
    ;
sep :   WS+ ;

DQARG  : DQ (BS . | ~(BS | DQ))+ DQ
        {setText( strip(getText()) );};
SQARG  : SQ (BS . | ~(BS | SQ))+ SQ
        {setText( strip(getText()) );} ;
BAREARG: (BS . | ~(BS | WS | DQ | SQ))+ ;

WS  :   ( ' ' | '\t' | '\r' | '\n');

I ended up doing this:我最终这样做了:

(filter seq
        (flatten
         (map #(%1 %2)
              (cycle [#(s/split % #" ") identity])
              (s/split (read-line) #"(?<!\\)(?:'|\")"))))

I know this is a very old thread, but I came across this same problem and used java interop to call:我知道这是一个非常古老的线程,但我遇到了同样的问题并使用 java interop 调用:

(CommandLineUtils/translateCommandline cmd-line)

from Plexus Common Utilities .来自Plexus Common Utilities

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM