有选择地将文件从一个文件夹目录复制到另一个文件夹

Question

I have a directory tree where names of folder matter a lot. 我有一个目录树，其中文件夹的名称很重要。 Also i have a csv saying from folder1>folder2>folder3>foo.txt. 我也有一个csv说从folder1> folder2> folder3> foo.txt。 folder1,folder2,folder3 and txt all in different column of csv. folder1，folder2，folder3和txt都位于csv的不同列中。 I need to keep the directory structure as is and copy the files that are given in the csv. 我需要保持目录结构不变，并复制csv中给出的文件。

Approch i am trying is copied the directory tree and writing a python code to remove unwanted files. 我正在尝试的方法是复制目录树并编写python代码以删除不需要的文件。 So there are lot of loops but I have over 415,000 rows in csv. 所以有很多循环，但是我在csv中有超过415,000行。

csv example:<br/>
pdf_no .   folder1. folder2 . folder3. <br/> 1 .  . abc. pqr. xyz.<br/>

This is the format of csv and I have no issue with extracting column data with help of pandas dataframe in python. 这是csv的格式，借助python中的pandas dataframe提取列数据没有问题。 Originally it was a .dta file I converted to .csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'. 最初，这是一个.dta文件，我.csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'.将其转换为.csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'. .csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'. The 'pdf_no.' “ pdf_no”。 column contains filenames which is a number that we want in the given folder maintaining file structure. 列包含文件名，这是我们想要在给定文件夹中保持文件结构的数字。

So it takes a lot of time and whenever I change a bit it again will take much time and I don't even know if it getting correct. 因此，这会花费很多时间，每当我再次更改时，都会花费很多时间，我什至不知道它是否正确。

Answer 1

You're needing the shutil.copytree method. 您需要shutil.copytree方法。 Here is what you could do: 您可以执行以下操作：

Read your CSV 阅读您的CSV
Build the file path (with os.path.join()) 构建文件路径（使用os.path.join（））
Use shutil.copytree to copy the file and its parent directories to the target 使用shutil.copytree将文件及其父目录复制到目标

Maybe you will have to add a try...except block to avoid an OsError when the target file already exists, or delete the target file before copying the new one. 也许您必须添加一个try...except块，以避免在目标文件已存在时出现OsError ，或者在复制新文件之前删除目标文件。

Answer 2

Sample csv 样本csv

pdf_no,folder1,folder2,folder3
1,abc,def,ghi
2,xyz,pqr,
3,abc,def,ghi

Sample code 样例代码

import csv
import os
import shutil


target_csv = 'selection.csv'
target_dir = 'selected_20190828/'
source_dir = 'original_directory/'

with open(target_csv) as f:
    rows = csv.reader(f)
    for line_no, row in enumerate(rows):
        if line_no == 0:  # Skip the first line because it's the title
            continue

        pdf_name = row[0] + '.pdf'
        dir_path = os.path.join(*row[1:])

        source = os.path.join(source_dir, dir_path, pdf_name)
        if not os.path.isfile(source):
            print('not exist: ', line_no, source)
            continue
        target = os.path.join(target_dir, dir_path)
        os.makedirs(target)
        shutil.copy2(source, target)

Explanation 说明

You don't need pandas actually, all you need is 实际上您不需要pandas ，您所需要的只是

csv.reader to read csv file into list csv.reader将csv文件读入list
os.makedirs to create folders (this method is similar to mkdir -p in bash) os.makedirs创建文件夹（此方法类似于bash中的mkdir -p ）
os.path.join
shutil.copy2 to copy file to a new folder shutil.copy2将文件复制到新文件夹
os.path.isfile to make sure the original file exists os.path.isfile以确保原始文件存在

I have tested the code above. 我已经测试了上面的代码。 It should be working. 它应该正在工作。

有选择地将文件从一个文件夹目录复制到另一个文件夹

问题描述

2 个解决方案

解决方案1
0 2019-08-28 07:51:10

解决方案2
0 已采纳 2019-08-28 08:24:33

Sample csv 样本csv

Sample code 样例代码

Explanation 说明

有选择地将文件从一个文件夹目录复制到另一个文件夹

问题描述

2 个解决方案

解决方案1 0 2019-08-28 07:51:10

解决方案2 0 已采纳 2019-08-28 08:24:33

Sample csv 样本csv

Sample code 样例代码

Explanation 说明

解决方案1
0 2019-08-28 07:51:10

解决方案2
0 已采纳 2019-08-28 08:24:33