简体   繁体   中英

how to delete the repeat lines in emacs

I have a text with a lots of lines, my question is how to delete the repeat lines in emacs? using the command in emacs or elisp packages without external utils.

for example:

this is line a
this is line b
this is line a

to remove the 3rd line (same as 1st line)

this is line a
this is line b

If you have Emacs 24.4 or newer, the cleanest way to do it would be the new delete-duplicate-lines function. Note that

  • this works on a region, not a buffer, so select the desired text first
  • it maintains the relative order of the originals, killing the duplicates

For example, if your input is

test
dup
dup
one
two
one
three
one
test
five

Mx delete-duplicate-lines would make it

test
dup
one
two
three
five

You've the option of searching from backwards by prefixing it with the universal argument ( Cu ). The result would then be

dup
two
three
one
test
five

Credit goes to emacsredux.com .

Other roundabout options, not giving quite the same result, available via Eshell:

  1. sort -u ; doesn't maintain the relative order of the originals
  2. uniq ; worse it needs its input to be sorted

Put this code to your .emacs:

(defun uniq-lines (beg end)
  "Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (while (not (eobp))
        (kill-line 1)
        (yank)
        (let ((next-line (point)))
          (while
              (re-search-forward
               (format "^%s" (regexp-quote (car kill-ring))) nil t)
            (replace-match "" nil nil))
          (goto-char next-line))))))

Usage:

M-x uniq-lines

In linux, select region, and type

M-| uniq <RETURN>

The result without duplicates are in new buffer.

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (setq s                           ; because Emacs can't properly
                                        ; split lines :/
            (substring 
             s (position-if
                (lambda (x)
                  (not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

An alternative:

  • Will not use undo history to store matches.
  • Will be in general faster (but if you are after ultimate speed - build a prefix tree).
  • Has an effect of replacing all former newline characters, whatever they were with \\n (UNIX-style). Which may be a bonus or a disadvantage, depending on your situation.
  • You could make it a little bit better (faster), if you re-implement split-string in a way that it accepts characters instead of regular expression.

Somewhat longer, but, perhaps, a bit more efficient variant:

(defun split-string-chars (string chars &optional omit-nulls)
  (let ((separators (make-hash-table))
        (last 0)
        current
        result)
    (dolist (c chars) (setf (gethash c separators) t))
    (dotimes (i (length string)
                (progn
                 (when (< last i)
                   (push (substring string last i) result))
                 (reverse result)))
      (setq current (aref string i))
      (when (gethash current separators)
        (when (or (and (not omit-nulls) (= (1+ last) i))
                  (/= last i))
          (push (substring string last i) result))
        (setq last (1+ i))))))

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string-chars
                (buffer-substring-no-properties start end) '(?\n) t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

Another way:

  1. Select a region of text.
  2. Ctrl-U (prefix), M-| (shell-command-on-region), sort -u (the command to run on the selection and replace the selection with its output).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM