简体   繁体   English

在 Mac 上使用 Python 在子文件夹中循环检查文件是否存在,打开、复制、粘贴然后关闭文件

[英]Loop through sub-folders with Python on Mac to check if file exists, open, copy, paste then close file

I have a folder, with numerous (~200) sub-folders on my Mac desktop.我有一个文件夹,在我的 Mac 桌面上有许多(~200)个子文件夹。

Some (but not all) sub-folders contain a csv file named "sample.csv".一些(但不是全部)子文件夹包含名为“sample.csv”的 csv 文件。

Further, I have a "aggregate.csv" file where I would like to copy the 2nd column of each "sample.csv" data into.此外,我有一个“aggregate.csv”文件,我想将每个“sample.csv”数据的第二列复制到其中。

Structure:

"/Desktop/folder"
"/Desktop/folder/aggregate.csv"

"/Desktop/folder/sub-folder"
"/Desktop/folder/sub-folder/sample.csv"

Using Python, how can I loop though each sub-folder, check if "sample.csv" exists, open it, copy the 2nd column, paste this column into the "aggregate.csv" file, close "sample.csv", then move on to the next sub-folder?使用 Python,如何循环遍历每个子文件夹,检查“sample.csv”是否存在,打开它,复制第二列,将此列粘贴到“aggregate.csv”文件中,关闭“sample.csv”,然后移动到下一个子文件夹?

In "aggregate.csv", the copied data should increment to the right, so it doesn't overwrite the previous "sample.csv" data that has just been pasted.在“aggregate.csv”中,复制的数据应该向右递增,因此它不会覆盖之前刚刚粘贴的“sample.csv”数据。

My computer is opening the CSV files with Excel, that's why I refer to the "2nd column".我的电脑正在使用 Excel 打开 CSV 文件,这就是我提到“第二列”的原因。

Many thanks非常感谢

$ cd ~

$ more aggregate.csv
X
X
X
X
X
X

$ more ./Desktop/folder/sub-folder/sample.csv
A,1
A,2
A,3
A,4
A,5

$ more ./Desktop/folder/sub-folder/sub-sub-folder/sample.csv
B,6
B,7
B,8
B,9

$ more ./Desktop/folder/sub-folder2/sample.csv
C,10
C,11
C,12
C,13
C,14
C,15
C,16

$ more ./Desktop/folder/sub-folder3/sub-sub-folder/sample.csv
D,17
D,18
D,19

$ python3 aggregate_samples.py ./Desktop
./Desktop/folder/sub-folder/sample.csv
./Desktop/folder/sub-folder/sub-sub-folder/sample.csv
./Desktop/folder/sub-folder2/sample.csv
./Desktop/folder/sub-folder3/sub-sub-folder/sample.csv

$ cat aggregate.csv

X,1,6,10,17
X,2,7,11,18
X,3,8,12,19
X,4,9,13,
X,5,,14,
X,,,15,
,,,16,

Here is the code that accomplishes this.这是完成此操作的代码。 The key technologies you need: os.walk() to recursively search the folders, the csv module to read in the sample.csv files (and get the 2nd column), lists to accumulate the samples, and csv again to write out the result.您需要的关键技术: os.walk()递归搜索文件夹, csv模块读取样本。 sample.csv文件(并获得第 2 列), 列表以累积样本, csv再次写出结果. I assumed your sample.csv files will be different lengths, and so the code handles that (by pre-allocating a sparse matrix).我假设您的sample.csv文件将具有不同的长度,因此代码可以处理(通过预分配稀疏矩阵)。

This assumes your dataset is small enough to fit into memory.这假设您的数据集足够小以适合 memory。 If not, then more work needs to be done.如果没有,那么需要做更多的工作。

# aggregate_samples.py
import os
import sys
import argparse
import csv

def main(options):
    columns = []

    try:
        # Load in aggregate.csv, if there is one.
        with open('aggregate.csv') as f:
            column = [line.rstrip('\n') for line in f]
            columns.append(column)
    except FileNotFoundError:
        # Doesn't exist; create it later.
        pass

    longest_sample = 0
    for d, subdirs, files in os.walk(options.directory):
        subdirs.sort()
        for filename in files:

            if filename == 'sample.csv':
                file_path = os.path.join(d, filename)
                print(file_path)

                samples = []
                with open(file_path) as f:
                    reader = csv.reader(f, delimiter=',')
                    # Get the 2nd column.
                    for sample in reader:
                        samples.append(sample[1])
                longest_sample = max(longest_sample, len(samples))
                columns.append(samples)

    # Pre-fill a transpose matrix according to number of columns
    # and longest colum.
    a = [ [ '' for i in columns ] for j in range(longest_sample) ]

    # Move samples into matrix, transposing as you go.
    for i in range(len(columns)):
        for j in range(len(columns[i])):
            a[j][i] = columns[i][j]

    # Output matrix as CSV.
    with open('aggregate.csv', 'w+') as aggregate:
        writer = csv.writer(aggregate, delimiter=',')
        writer.writerows(a)

    return 0

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'directory',
        help='Directory path.')
    options = parser.parse_args()
    sys.exit(main(options))

You can get all the info you need about the script package here .您可以在此处获取有关script package 所需的所有信息。

Here is how to set the Program argument, using Packages->Script->Configure Script:以下是如何使用 Packages->Script->Configure Script 设置 Program 参数:

包->脚本->配置脚本

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM