[英]Selectively copy files from one folder directory to another
I have a directory tree where names of folder matter a lot. 我有一个目录树,其中文件夹的名称很重要。 Also i have a csv saying from folder1>folder2>folder3>foo.txt. 我也有一个csv说从folder1> folder2> folder3> foo.txt。 folder1,folder2,folder3 and txt all in different column of csv. folder1,folder2,folder3和txt都位于csv的不同列中。 I need to keep the directory structure as is and copy the files that are given in the csv. 我需要保持目录结构不变,并复制csv中给出的文件。
Approch i am trying is copied the directory tree and writing a python code to remove unwanted files. 我正在尝试的方法是复制目录树并编写python代码以删除不需要的文件。 So there are lot of loops but I have over 415,000 rows in csv. 所以有很多循环,但是我在csv中有超过415,000行。
csv example:<br/>
pdf_no . folder1. folder2 . folder3. <br/> 1 . . abc. pqr. xyz.<br/>
This is the format of csv and I have no issue with extracting column data with help of pandas dataframe in python. 这是csv的格式,借助python中的pandas dataframe提取列数据没有问题。 Originally it was a .dta file I converted to .csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'.
最初,这是一个.dta文件,我.csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'.
将其转换为.csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'.
.csv with pandas. So 'folder1' > 'folder 2' > 'folder 3' > 'pdf_no'.
The 'pdf_no.' “ pdf_no”。 column contains filenames which is a number that we want in the given folder maintaining file structure. 列包含文件名,这是我们想要在给定文件夹中保持文件结构的数字。
So it takes a lot of time and whenever I change a bit it again will take much time and I don't even know if it getting correct. 因此,这会花费很多时间,每当我再次更改时,都会花费很多时间,我什至不知道它是否正确。
You're needing the shutil.copytree method. 您需要shutil.copytree方法。 Here is what you could do: 您可以执行以下操作:
Maybe you will have to add a try...except
block to avoid an OsError
when the target file already exists, or delete the target file before copying the new one. 也许您必须添加一个try...except
块,以避免在目标文件已存在时出现OsError
,或者在复制新文件之前删除目标文件。
pdf_no,folder1,folder2,folder3
1,abc,def,ghi
2,xyz,pqr,
3,abc,def,ghi
import csv
import os
import shutil
target_csv = 'selection.csv'
target_dir = 'selected_20190828/'
source_dir = 'original_directory/'
with open(target_csv) as f:
rows = csv.reader(f)
for line_no, row in enumerate(rows):
if line_no == 0: # Skip the first line because it's the title
continue
pdf_name = row[0] + '.pdf'
dir_path = os.path.join(*row[1:])
source = os.path.join(source_dir, dir_path, pdf_name)
if not os.path.isfile(source):
print('not exist: ', line_no, source)
continue
target = os.path.join(target_dir, dir_path)
os.makedirs(target)
shutil.copy2(source, target)
You don't need pandas
actually, all you need is 实际上您不需要pandas
,您所需要的只是
csv.reader
to read csv file into list
csv.reader
将csv文件读入list
os.makedirs
to create folders (this method is similar to mkdir -p
in bash) os.makedirs
创建文件夹(此方法类似于bash中的mkdir -p
) os.path.join
shutil.copy2
to copy file to a new folder shutil.copy2
将文件复制到新文件夹 os.path.isfile
to make sure the original file exists os.path.isfile
以确保原始文件存在 I have tested the code above. 我已经测试了上面的代码。 It should be working. 它应该正在工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.