简体   繁体   中英

Check for file duplication and remove

Basically what I'm looking to do is remove duplicated files and directories which exist among 2 particular locations. What I would like to do is create a script which will check the contents of directories "A" and "B" and in any cases where a directory that exists in "B" is also present in "A", remove from "B".

EXAMPLE:

/some/path/a

dir1
    file1.ext
    file2.ext
    file3.ext
dir2
    file1.ext
    file2.ext
    file3.ext

/some/path/b

dir1
    file1.ext
    file2.ext
    file3.ext
dir3
    file1.ext
    file2.ext
    file3.ext

In this example, the desired outcome would be to recognize that "dir1" exists in both places and then remove "dir1" and its contents from /some/path/b leaving everything else alone. I have played around in the terminal trying to achieve these results and looked online for answers but haven't found anything that fits this particular use case. Any help would be much appreciated.

Using bash and comm from GNU coreutils:

a=some/path/a
b=some/path/b

mapfile -d '' dirs_to_del <                           \
  <(comm -z12                                         \
    <(shopt -s nullglob; cd "$a" && printf '%s\0' */) \
    <(shopt -s nullglob; cd "$b" && printf '%s\0' */))
cd "$b" && rm -rf -- "${dirs_to_del[@]}"

Drop the echo if output looks ok.

You're looking for something like this:

A=/some/path/a
B=/some/path/b

shopt -s nullglob

for dir in "$B"/*/; do
  if test -d "$A${dir#"$B"}"; then
    echo rm -r -- "$dir"
  fi
done

Remove echo if you're happy with the output.

I use something like this

diff -r -s a b | grep -v "Only in" | awk '{print $4}' | xargs rm

Given the folder structure like this

$ tree
.
├── a
│   ├── dir1
│   │   ├── file1.txt
│   │   ├── file2.txt
│   │   └── file3.txt
│   └── dir2
│       ├── file1.txt
│       ├── file2.txt
│       └── file3.txt
└── b
    ├── dir1
    │   ├── file1.txt
    │   └── file2.txt
    └── dir3
        ├── file1.txt
        ├── file2.txt
        └── file3.txt

diff -r -sab should show

Files a/dir1/file1.txt and b/dir1/file1.txt are identical
Files a/dir1/file2.txt and b/dir1/file2.txt are identical
Only in a/dir1: file3.txt
Only in a: dir2
Only in b: dir3

The -s is explained in diff as

       -s, --report-identical-files
              report when two files are the same

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM