簡體   English   中英

在目錄中查找重復文件和文件夾並將重復項移動到 python 中的不同文件夾

[英]Find duplicates files and folders in directory and move the duplicates to different folders in python

我對 python 很陌生,我正在尋求幫助。 我正在嘗試在目錄中查找重復的文件夾和文件,並將它們移動到同一目錄中的不同文件夾(稱為Duplicates ),並在名為Single_Copy的文件夾中保留所有文件的單個副本。我能夠找到重復項和在 CSV 文件中添加他們的信息,但無法create文件並將其移動到 Duplicates 和 Single_Copy 文件夾。這段代碼沒有正確顯示重復的文件。請指導一下。 請在附件中找到我的一段代碼,

# checkDuplicates.py
# Python 2.7.6

"""
Given a folder, walk through all files within the folder and subfolders
and get list of all files that are duplicates
The md5 checcksum for each file will determine the duplicates
"""

import os
import hashlib
from collections import defaultdict
import csv

src_folder = "C://Users//renu//Desktop//SNow work related"
def generate_md5(fname, chunk_size=1024):
    """
    Function which takes a file name and returns md5 checksum of the file
    """
    hash = hashlib.md5()
    with open(fname, "rb") as f:
        # Read the 1st block of the file
        chunk = f.read(chunk_size)
        # Keep reading the file until the end and update hash
        while chunk:
            hash.update(chunk)
            chunk = f.read(chunk_size)

    # Return the hex checksum
    return hash.hexdigest()


if __name__ == "__main__":
    """
    Starting block of script
    """

    # The dict will have a list as values
    md5_dict = defaultdict(list)

    file_types_inscope = ["ppt", "pptx", "pdf", "txt", "html",
                          "mp4", "jpg", "png", "xls", "xlsx", "xml",
                          "vsd", "py", "json"]

    # Walk through all files and folders within directory
    for path, dirs, files in os.walk(src_folder):
        print("Analyzing {}".format(path))
        for each_file in files:
            if each_file.split(".")[-1].lower() in file_types_inscope:
                # The path variable gets updated for each subfolder
                file_path = os.path.join(os.path.abspath(path), each_file)
                # If there are more files with same checksum append to list
                md5_dict[generate_md5(file_path)].append(file_path)

    # Identify keys (checksum) having more than one values (file names)
    duplicate_files = (
        val for key, val in md5_dict.items() if len(val) > 1)

    # Write the list of duplicate files to csv file
    with open("duplicates.csv", "w") as log:
        # Lineterminator added for windows as it inserts blank rows otherwise
        csv_writer = csv.writer(log, quoting=csv.QUOTE_MINIMAL, delimiter=",",
                                lineterminator="\n")
        header = ["File Names"]
        csv_writer.writerow(header)

        for file_name in duplicate_files:
            csv_writer.writerow(file_name)

    print("Done")

正如@Grismar 所說,您可以使用模塊osshutil

import os
import shutil

os.rename("your/current/path/file.txt", "your/new/path/file.txt")
shutil.move("your/current/path/file.txt", "your/new/path/file.txt")

個人喜好: shutil 因為如果您在 windows 上,os.rename 將默默地替換具有相同名稱的現有文件。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM