简体   繁体   English

在目录中查找重复文件和文件夹并将重复项移动到 python 中的不同文件夹

[英]Find duplicates files and folders in directory and move the duplicates to different folders in python

I am very new to python and i am looking for help.我对 python 很陌生,我正在寻求帮助。 I am trying to find duplicate folders and files in a directory and move them to a different folder(called Duplicates )in the same directory and retain a single copy of all the files in a folder calles Single_Copy .I am able to find the duplicates and add their info in the CSV file but unable to create and move the files to Duplicates and Single_Copy folder.This piece of code is not showing the duplicated files properly.Could you please guide.我正在尝试在目录中查找重复的文件夹和文件,并将它们移动到同一目录中的不同文件夹(称为Duplicates ),并在名为Single_Copy的文件夹中保留所有文件的单个副本。我能够找到重复项和在 CSV 文件中添加他们的信息,但无法create文件并将其移动到 Duplicates 和 Single_Copy 文件夹。这段代码没有正确显示重复的文件。请指导一下。 Please find my piece of code attached,请在附件中找到我的一段代码,

# checkDuplicates.py
# Python 2.7.6

"""
Given a folder, walk through all files within the folder and subfolders
and get list of all files that are duplicates
The md5 checcksum for each file will determine the duplicates
"""

import os
import hashlib
from collections import defaultdict
import csv

src_folder = "C://Users//renu//Desktop//SNow work related"
def generate_md5(fname, chunk_size=1024):
    """
    Function which takes a file name and returns md5 checksum of the file
    """
    hash = hashlib.md5()
    with open(fname, "rb") as f:
        # Read the 1st block of the file
        chunk = f.read(chunk_size)
        # Keep reading the file until the end and update hash
        while chunk:
            hash.update(chunk)
            chunk = f.read(chunk_size)

    # Return the hex checksum
    return hash.hexdigest()


if __name__ == "__main__":
    """
    Starting block of script
    """

    # The dict will have a list as values
    md5_dict = defaultdict(list)

    file_types_inscope = ["ppt", "pptx", "pdf", "txt", "html",
                          "mp4", "jpg", "png", "xls", "xlsx", "xml",
                          "vsd", "py", "json"]

    # Walk through all files and folders within directory
    for path, dirs, files in os.walk(src_folder):
        print("Analyzing {}".format(path))
        for each_file in files:
            if each_file.split(".")[-1].lower() in file_types_inscope:
                # The path variable gets updated for each subfolder
                file_path = os.path.join(os.path.abspath(path), each_file)
                # If there are more files with same checksum append to list
                md5_dict[generate_md5(file_path)].append(file_path)

    # Identify keys (checksum) having more than one values (file names)
    duplicate_files = (
        val for key, val in md5_dict.items() if len(val) > 1)

    # Write the list of duplicate files to csv file
    with open("duplicates.csv", "w") as log:
        # Lineterminator added for windows as it inserts blank rows otherwise
        csv_writer = csv.writer(log, quoting=csv.QUOTE_MINIMAL, delimiter=",",
                                lineterminator="\n")
        header = ["File Names"]
        csv_writer.writerow(header)

        for file_name in duplicate_files:
            csv_writer.writerow(file_name)

    print("Done")

As @Grismar said, you can use the modules os or shutil .正如@Grismar 所说,您可以使用模块osshutil

import os
import shutil

os.rename("your/current/path/file.txt", "your/new/path/file.txt")
shutil.move("your/current/path/file.txt", "your/new/path/file.txt")

personal preference: shutil ;个人喜好: shutil because if you're on windows, os.rename will silently replace an existing file with the same name.因为如果您在 windows 上,os.rename 将默默地替换具有相同名称的现有文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM