简体   繁体   English

目录比较

[英]Directory comparision

I have a question in relation to comparing directories.我有一个关于比较目录的问题。 I have requested a transfer of data from one company server, I received this data in 8 portable hard drives as it was not possible due to large volume to put it on one.我已请求从一台公司服务器传输数据,我在 8 个便携式硬盘驱动器中收到了这些数据,因为体积太大无法将其放在一个上。 each of them contains about 1TB data.他们每个人都包含大约 1TB 的数据。 Now, I want to make sure that all the files that were on that company server were fully transferred, and if there are any files missing I want to be able to detect it and ask for them.现在,我想确保那家公司服务器上的所有文件都已完全传输,如果有任何文件丢失,我希望能够检测到它并请求它们。 The issue is that the only thing I received from the company is one txt file inn which there is detailed directory structure saved in a tree format.问题是我从公司收到的唯一东西是一个txt文件客栈,其中有以树格式保存的详细目录结构。 In principle great I could just look one by one through it but due to large amount of data that is just not achievable.原则上很好,我可以通过它一个一个地看,但由于大量的数据是无法实现的。 I can generate the same directory list out of every single one of the 8 drives that I received.我可以从我收到的 8 个驱动器中的每一个驱动器中生成相同的目录列表。 but how can I ten compare this one file into those 8 files?但是我怎样才能十次将这个文件与那 8 个文件进行比较呢? I tried different python comparison codes to parse through line by line but it does not work as this compares them string by string(line by line) but the are not in string format, they are in tree style format.我尝试了不同的 python 比较代码来逐行解析,但它不起作用,因为这会逐个字符串(逐行)比较它们,但它们不是字符串格式,它们是树形格式。 Anyone has any suggestions how to do it?is there a way to convert the file of a tree format into a string format to then run it in Python program and compare?有人对如何做有任何建议吗?有没有办法将树格式的文件转换为字符串格式,然后在 Python 程序中运行并进行比较? Or should I request (not sure if thats possible) another file with directories saved in different structure format than tree?或者我应该请求(不确定是否可能)另一个文件,其中的目录以不同于树的结构格式保存? if yes how this should be generated?如果是,应该如何生成?

I tried to parse it through lists in python我试图通过 python 中的列表来解析它

Your task consists of two subtasks:您的任务包含两个子任务:

  1. Prepare data;准备资料;
  2. Compare trees.比较树木。

Task 2 can be solved by this Tree class:任务 2 可以通过这Tree class 来解决:

from collections import defaultdict


class Tree:
    def __init__(self):
        self._children = defaultdict(Tree)

    def __len__(self):
        return len(self._children)

    def add(self, value: str):
        value = value.removeprefix('/').removesuffix('/')
        if value == '':
            return
        first, rest = self._get_values(value)
        self._children[first].add(rest)

    def remove(self, value: str):
        value = value.removeprefix('/').removesuffix('/')
        if value == '':
            return
        first, rest = self._get_values(value)
        self._children[first].remove(rest)
        if len(self._children[first]) == 0:
            del self._children[first]

    def _get_values(self, value):
        values = value.split('/', 1)
        first, rest = values[0], ''
        if len(values) == 2:
            rest = values[1]
        return first, rest

    def __iter__(self):
        if len(self._children) == 0:
            yield ''
            return

        for key, child in self._children.items():
            for value in child:
                if value != '':
                    yield key + '/' + value
                else:
                    yield key


def main():
    tree = Tree()

    tree.add('a/b/c')
    tree.add('a/b/d')
    tree.add('b/b/e')
    tree.add('b/c/f')
    tree.add('c/b/e')

    # duplicates are okay
    tree.add('b/c/f')

    # leading and trailing slashes are okay
    tree.add('b/c/f/')
    tree.add('/b/c/f')

    print('Before removing:', [x for x in tree])

    tree.remove('a/b/c')
    tree.remove('a/b/d')
    tree.remove('b/b/e')

    # it's okay to remove non-existent values
    tree.remove('this/path/does/not/exist')

    # it will not remove root-level b folder, because it's not empty
    tree.remove('b')

    print('After removing:', [x for x in tree])


if __name__ == '__main__':
    main()

Output: Output:

Before removing: ['a/b/c', 'a/b/d', 'b/b/e', 'b/c/f', 'c/b/e']
After removing: ['b/c/f', 'c/b/e']

So, your algorithm is as follows:所以,你的算法如下:

  1. Build tree of file paths on company server (let's call it A-tree );在公司服务器上构建文件路径树(我们称之为A-tree );
  2. Build trees of file paths on portable hard drives (let's call them B-trees );在便携式硬盘驱动器上构建文件路径树(我们称它们为B-trees );
  3. Remove from the A-tree file paths that exist in B-trees ;A-tree文件中删除B-trees中存在的路径;
  4. Print content of the A-tree .打印A-tree的内容。

Now, all you need - is to build these trees, which is task 1 from the answer beginning.现在,您所需要的就是构建这些树,这是从答案开始的任务 1。 And how it will be - depends on the data, that you have in your .txt file.它将如何 - 取决于您在.txt文件中的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM