繁体   English   中英

使用elisp处理文本

[英]Processing text with elisp

自从我转换为Emacs教会以来,我一直在尝试从内部做所有事情,我想知道如何快速高效地进行文本处理。

作为一个例子,让我们在几分钟前在org-mode上编辑我正在编辑的列表。

** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b

它是与标签关联的名称列表,我想获得与名称关联的标签列表。

在bash中,我首先用单引号回显整个粘贴的东西,然后将其传递给awk,循环遍历每一行并将每个部分添加到正确的临时变量中,然后将其弄乱,直到它像我想要的那样。

echo '** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
' | awk '{sub(":","");for (i=3;i<=NF;i++) members[$i] = members[$i] " " $2}; END{for (j in members) print j ": " members[j]}' | sort

......和TA-DA! 预期的输出不到2分钟,以直观和渐进的方式完成。 你能告诉我如何在elisp中做这样的事情,最好是在emacs缓冲区,优雅和简单吗?

谢谢!

我要做的第一件事就是利用org-mode的标签支持。 代替

** Diego: b QI

你将会拥有

** Diego                          :b:QI:

哪个org-mode为标签“b”和“QI”。

要将当前格式转换为标准org-mode ,可以使用以下内容(假设带有源的缓冲区称为“asdf”)

(with-current-buffer "asdf"
  (beginning-of-buffer)
  (replace-string " " ":")
  (beginning-of-buffer)
  (replace-string "**:" "** ")
  (beginning-of-buffer)
  (replace-string "::" " :")
  (beginning-of-buffer)
  (replace-string "\n" ":\n")
  (org-set-tags-command t t))

这不是很好或有效,但它完成了工作。

之后,您可以使用以下命令从shell脚本生成具有所需格式的缓冲区:

(let ((results (get-buffer-create "results"))
      tags)
  (with-current-buffer "asdf"
    (beginning-of-buffer)
    (while (org-on-heading-p)
      (mapc '(lambda (item) (when item (add-to-list 'tags item))) (org-get-local-tags))
      (outline-next-visible-heading 1)))
  (setq tags (sort tags 'string<))
  (with-current-buffer results
    (erase-buffer)
    (mapc '(lambda (item)
             (insert (format "%s: %s\n"
                             item
                             (with-current-buffer "asdf"
                               (org-map-entries '(substring-no-properties (org-get-heading t)) item)))))
          tags)
    (beginning-of-buffer)
    (replace-regexp "[()]" "")))

这会将结果放在一个名为“results”的缓冲区中,如果它尚不存在则创建它。 基本上,它收集缓冲区“asdf”中的所有标签,对它们进行排序,然后循环遍历每个标签,并在“asdf”中搜索带有该标签的每个标题,并将其插入“结果”。

通过一些清理,这可以成为一个功能; 基本上只是用参数替换“asdf”和“results”。 如果你需要那样做,我可以做到。

有一个函数shell-command-on-region就像它所说的那样。 您可以突出显示某个区域,执行M- |,键入shell命令的名称,然后将数据传送到该命令。 给它一个参数,并用命令的结果替换该区域。

对于一个简单的例子,突出显示一个区域,输入'Cu 0 M- | wc'(control-u,zero,meta-pipe,然后是'wc'),该区域将替换为该区域的字符数,字数和行数。

您可以做的另一件事是弄清楚如何操作一行,使其成为宏,然后重复运行宏。 例如,'Cx(Cs foo Cg bar Cx)'将搜索单词“foo”,然后键入单词“bar”,将其更改为“foobar”。 然后,您可以执行'Cu Cx e',它将继续运行宏,直到找不到更多的“foo”。

好的,这是我在elisp的第一次尝试:

  1. 我启动了一个带有elisp和paredit模式的缓冲区,打开双引号并粘贴文本
  2. 我使用let将其绑定到符号
(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
"))
  foobar)

现在我把foobar变成了一种奇特的东西。

  1. 首先,我使用正则表达式删除符号,然后使用(split-string)将文本(split-string)
  2. 然后我做一个mapcar将每一行变成一个单词列表
(mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t))
  1. 然后我创建一个hashmap并将其绑定到temphash( (temphash (make-hash-table :test 'equal))
  2. 然后我循环进入嵌套列表,将元素添加到哈希表中。 我想我不应该用mapcar进行非函数式编程,但没有人在看;)
(mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)))
  1. 最后,我将哈希表中的元素提取到另一组嵌套列表中,并从Xah Lee的网页上窃取了一个方便的函数,
  2. 最后我用Mx pp-eval-last-sexp将它打印到另一个缓冲区

这有点令人费解,特别是双地图车,但它有点有效。 这是完整的“代码”:

;; Stolen from Xah Lee's page


(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
  )
)

;; Code

(let ((foobar "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto, tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen : an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b 
** Eduardo: b 
")
      (temphash  (make-hash-table :test 'equal)))
  (mapcar #'(lambda (l)
              (mapcar #'(lambda (m) (puthash m (format "%s %s" (car l) (let ((tempel (gethash m temphash)))
                                                            (if tempel tempel ""))) temphash)) (rest l)))
          (mapcar #'(lambda (y) (split-string y " " t)) (split-string (replace-regexp-in-string "[:\*]" "" foobar) "\n" t)))
  (hash-to-list temphash))

这是输出:

(("clô" "anão ")
 ("clo" "george ")
 ("q" "Erick ")
 ("de" "walrus ")
 ("h" "henrique ")
 ("cb" "leandro ")
 ("lang" "Peter ")
 ("est" "Peter ")
 ("fur" "Aldo ")
 ("pol" "Peter Aldo ")
 ("qt" "davidatenas Gabriel eumané henrique LZZ ")
 ("mmu" "Luca ")
 ("prog" "Luca ")
 ("gnu" "Luca ")
 ("rpg" "Erick raphael ")
 ("mimimi" "george rol Vitor ")
 ("an" "davidatenas eumané rol CarlosIsaksen GustavoKyon William LZZ tony ")
 ("mu" "daniel ")
 ("gif" "kenny ")
 ("cri" "walrus kenny ")
 ("7arte" "davidatenas jeff rol frederico CarlosIsaksen Luca raphael caue ")
 ("c" "Rodrigo ")
 ("pseudo" "Igor FilipePinheiro rol Peter Aldo caue Andre ")
 ("maia" "Andre ")
 ("1997" "davidatenas anão Erick henrique Peter CarlosIsaksen William Luca tony Jost ")
 ("hq" "anão CarlosIsaksen Jost ")
 ("pc" "William Luca Alan ")
 ("mil" "Peter Aldo Andre Alan ")
 ("gtk" "jeff Erick henrique frederico Peter CarlosIsaksen GustavoKyon Epic daniel GP ")
 ("lit" "FilipePinheiro mathias frederico Peter Luca GP ")
 ("etc" "GustavoPupo ")
 ("tr" "GustavoPupo ")
 ("pinto," "GustavoPupo ")
 ("esp" "davidatenas tony FelipeAugusto ")
 ("pr0n" "Gabriel daniel Herbert um ")
 ("rsrs" "anão Gabriel daniel caue Herbert um ")
 ("jo" "anão Erick mathias leandro CarlosIsaksen William Vitor Jost Alan Koma ")
 ("QI" "bruno-gil Diego ")
 ("b" "Eduardo HHahaah anão Erick Igor rol leandro Aldo William Luca raphael Vitor daniel caue Herbert Jost bruno-gil Diego "))

如果您了解* nix管道 ,那么您熟悉函数式编程 ,因为函数式编程将程序视为使用函数应用程序的连续数据转换。 还记得学校数学的功能构成吗? 基本上, g∘f表示首先应用f然后立即应用g(g∘f)(x)= g(f(x)) 功能程序是一个巨大的功能组合。 管道只是一个函数组合 ,只是方向相反: (g∘f)(x)在数学上与x | f | g相同 x | f | g x | f | g在命令行中。

有一个第三方库dash.el ,它为列表和树转换以及简化功能方法的函数和宏提供了大量功能。 其中一个是线程宏->> ,它模仿命令行管道:

(->> '(1 2 3) (-map '1+) (-reduce '+)) ; returns 9
;; equivalent to (-reduce '+ (-map '1+ '(1 2 3)))

因此,如果我们想通过串行应用操作来操作文本数据,我们的函数可能如下所示:

(defun key-value-swap (s)
  (->> s
       nil ; Split into lines
       nil ; Remove stars from each line
       nil ; Split each line
       nil ; Add 1st element as a value to each element starting from
           ; 2nd as keys
       nil ; Return a hash-table
       ))

完全按照您的要求执行的功能如下所示:

(defun key-value-swap (s)
  (let ((h (make-hash-table :test 'equal)))
    (->> s
         s-lines ; split into lines
         (--map (s-split "\\(\\s-\\|:\\)" ; split each line
                         (s-chop-prefix "** " it) ; throw away stars
                         t))
         (--map (-each (cdr it) ; for every field in the line, except 1st
                  (lambda (k) ; append 1st line to value under key
                    (puthash k (cons (car it) (gethash k h)) h)))))
    h)) ; return hash-table

(puthash k (cons (car it) (gethash kh)) h)看起来很神秘,但它只是意味着在哈希表中的每个键下都有一个列表,每次找到新值时都会附加到该列表中。 因此,如果在b下有(Diego)并且我们发现bruno-gil应该在b之下,则b下的值变为(bruno-gil Diego)

之前的替代方案很有意思,但我不相信捕获“我将如何在Emacs中将其作为最近转换”的问题。 我怀疑有人学习Emacs,着眼于使用Emacs Lisp来完成整个工作,可能会开始:

(defun create-tags-to-name (buffer-name)
  "Create a buffer filled with lines containg `** TAG:
LIST-OF-NAMES' by transposing lines in the region matching the
format `** NAME: LIST-OF-TAGS' where the list items are white
space separated."
  (interactive)
  (let ((buf (get-buffer-create buffer-name))
    (tag-to-name-list (list))
    name tags element)
    ;; Clear the destination buffer
    (with-current-buffer buf
      (erase-buffer))
    ;; Build the list of tag to name associations.
    (while (re-search-forward "^** \\([-a-zA-Z0-9 ]+\\):\\(.+\\)$" (point-max) t)
      (setq name (buffer-substring (match-beginning 1) (match-end 1))
        tags (split-string (buffer-substring (match-beginning 2) (match-end 2))))
      ;; For each tag add the name to the tag's name list
      (while tags
    (let ((tag (car tags)))
      (setq element (assoc tag tag-to-name-list)
        tags (cdr tags))
      (if element
          (setcdr element (append (list name) (cdr element)))
        (setq tag-to-name-list (append (list (cons tag (list name))) tag-to-name-list))))))
    ;; Dump the associations to the target buffer
    (with-current-buffer buf
      (while tag-to-name-list
    (setq element (car tag-to-name-list)
          tag-to-name-list (cdr tag-to-name-list))
    (insert (concat "** " (car element) ":"))
    (let ((tag-list (cdr element)))
      (while tag-list
        (insert " " (car tag-list))
        (setq tag-list (cdr tag-list))))
    (insert "\n")))))

这是我的第二次尝试。 我写了一个小宏和一些函数来处理这些数据。

(defun better-numberp (s)
  (string-match "^ *[0-9.,]* *$" s))

(defmacro awk-like (&rest args)
  (let ((arg (car (last args)))
        (calls (mapcar #'(lambda (l)
                           (cond
                            ((numberp (first l)) (cons `(lambda (f) (equal %r ,(first l))) (rest l)))
                            ((stringp (first l)) (cons `(lambda (f) (string-match ,(first l) %)) (rest l)))
                            (t l)))
                       (butlast args))))
    `(mapcar #'(lambda (%%)
                 (let ((%r 0))
                   (mapcar
                    #'(lambda (l)
                        (setq %r (1+ %r))
                        (let ((% l))
                          (dolist (tipo ',calls)
                            (progn
                              (setq % (cond
                                       ((funcall (first tipo) %) (eval (cadr tipo))) (t %)))
                              (set (intern (format "%%%d" %r)) %))) %)) %%)))
             (mapcar #'(lambda (y) (split-string y " " t))
                     (split-string ,arg "\n" t)))))

(defun hash-to-list (hashtable)
  "Return a list that represent the hashtable."
  (let (mylist)
    (maphash (lambda (kk vv) (setq mylist (cons (list kk vv) mylist))) hashtable)
    mylist
    )
  )

(defun append-hash (key value hashtable)
  (let ((current (gethash key hashtable)))
    (puthash key 
             (cond
              ((null current) (list value))
              ((listp current) (cons value current))
              (t current)) 
             hashtable)))
(let ((foohash (make-hash-table :test 'equal)))
  (awk-like
   (2 (replace-regexp-in-string ":" "" %))
   ((lambda (f) (> %r 2))  (append-hash % %2 foohash))
   "** Diego: b QI
** bruno-gil: b QI
** Koma: jo
** um: rsrs pr0n
** FelipeAugusto: esp
** GustavoPupo: pinto tr etc
** GP: lit gtk
** Alan: jo mil pc
** Jost: b hq jo 1997
** Herbert: b rsrs pr0n
** Andre: maia mil pseudo
** Rodrigo: c
** caue: b rsrs 7arte pseudo
** kenny: cri gif
** daniel: gtk mu pr0n rsrs b
** tony: an 1997 esp
** Vitor: b jo mimimi
** raphael: b rpg 7arte
** Luca: b lit gnu pc prog mmu 7arte 1997
** LZZ: an qt
** William: b an jo pc 1997
** Epic: gtk
** Aldo: b pseudo pol mil fur
** GustavoKyon: an gtk
** CarlosIsaksen: an hq jo 7arte gtk 1997
** Peter: pseudo pol mil est 1997 gtk lit lang
** leandro: b jo cb
** frederico: 7arte lit gtk
** rol: b an pseudo mimimi 7arte
** mathias: jo lit
** henrique: 1997 h gtk qt
** eumané: an qt
** walrus: cri de
** FilipePinheiro: lit pseudo
** Igor: pseudo b
** Erick: b jo rpg q 1997 gtk
** Gabriel: pr0n rsrs qt
** george: clo mimimi
** anão: hq jo 1997 rsrs clô b
** jeff: 7arte gtk
** davidatenas:  an 7arte 1997 esp qt
** HHahaah: b
** Eduardo: b
")
  (hash-to-list foohash))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM