How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()

Question

Using pathlib.Path().glob(), how do we iterate through a directory and read in 2 files at each iteration?

Suppose my directory C:\Users\server\Desktop\Dataset looks like this:

P1_mean_fle.csv
P2_mean_fle.csv
P3_mean_fle.csv
P1_std_dev_fle.csv
P2_std_dev_fle.csv
P3_std_dev_fle.csv

If I want to read in only 1 file at each iteration of the Pi's, my code would look like this:

from pathlib import Path
import pandas as pd

file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'

for i, fle in enumerate(Path(file_path).glob(param_file)):
    mean_fle = pd.read_csv(fle).values

    results = tuning(mean_fle)  #tuning is some function which takes in the file mean 
                                #and does something with this file

Now, how I do read in 2 files at each iteration of the Pi's? The code below doesn't quite work because param_file can only be assigned with one file name type. Would appreciate if there is a way to do this using pathlib .

from pathlib import Path
import pandas as pd

param_file = 'P*' + '_mean_fle.csv'
param_file = 'P*' + '_std_dev_fle.csv'  #this is wrong

for i, fle in enumerate(Path(file_path).glob(param_file)):  #this is wrong inside the glob() part
    mean_fle = pd.read_csv(fle).values
    std_dev_fle = pd.read_csv(fle).values

    results = tuning(mean_fle, std_dev_fle)  #tuning is some function which takes in the two files mean 
                                             #and std_dev and does something with these 2 files

Thank you in advance.

Answer 1

If your filenames follow deterministic rules as in the example, your best bet is to iterate one kind of files, and find the corresponding file by string replacement.

from pathlib import Path
import pandas as pd

file_path = r'C:\Users\server\Desktop\Dataset'
param_file = 'P*' + '_mean_fle.csv'

for i, fle in enumerate(Path(file_path).glob(param_file)):
    stddev_fle = fle.with_name(fle.name.replace("mean", "std_dev"))
    mean_values = pd.read_csv(fle).values
    stddev_values = pd.read_csv(stddev_fle).values

    results = tuning(mean_values, stddev_values)

Answer 2

I suggest you two approaches:

1.

If you are sure that you have all your files without 'holes' in numbering, you can just take them without 'glob':

mean_csv_pattern = 'P{}_mean_fle.csv'
std_dev_pattern = 'P{}_std_dev_fle.csv'

i = 0
while True:
    i += 1
    try:
        mean_fle = pd.read_csv(mean_csv_pattern.format(i)).values
        std_dev_fle = pd.read_csv(std_dev_pattern.format(i)).values
    except (<put your exceptions here>):
        break
    results = tuning(mean_fle, std_dev_fle)

2.

Use a pre-fetch operation that takes all your files and put them in a structure that you can query in your main loop.

Glob for mean files, glob for std_dev files, take the number from the filename and biuld a dictionary {index: {'mean_file': mean_file, 'std_file': std_file)} and then loop over sorted dictionary keys...

How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()

Question

2 answers

solution1
3 ACCPTED 2020-04-15 04:53:40

solution2
1 2020-04-15 04:52:59

How to iterate through a directory and read in 2 files at each iteration using pathlib.Path().glob()

Question

2 answers

solution1 3 ACCPTED 2020-04-15 04:53:40

solution2 1 2020-04-15 04:52:59

solution1
3 ACCPTED 2020-04-15 04:53:40

solution2
1 2020-04-15 04:52:59