Argparse - Making a list of inputs and outputs

Question

New to python,

My professor has given me a piece of code to help process some imagery, however it only works one image at a time due to an input and output needing to be stipulated each time. Usually I would put import os or glob but argparse is something new to me and my usual methods do not work.

I need to edit this in order to create a list of '.hdf' files with the output being the same as the input just with a name change of '_Processed.hdf'

Code below:

# Import the numpy library
import numpy
# Import the GDAL library
from osgeo import gdal
# Import the GDAL/OGR spatial reference library
from osgeo import osr
# Import the HDF4 reader.
import pyhdf.SD
# Import the system library
import sys
# Import the python Argument parser
import argparse
import pprint
import rsgislib    

def creatGCPs(lat_arr, lon_arr):

    y_size = lat_arr.shape[0]
    x_size = lat_arr.shape[1]
    print(x_size)
    print(y_size)

    gcps = []
    for y in range(y_size):
        for x in range(x_size):
            gcps.append([x, y, lon_arr[y,x], lat_arr[y,x]])
    return gcps


def run(inputFile, outputFile):   
    hdfImg = pyhdf.SD.SD(inputFile)
    #print("Available Datasets")
    pprint.pprint(hdfImg.datasets())
    #print("Get Header Attributes")
    #attr = hdfImg.attributes(full=1)
    #pprint.pprint(attr)
    rsgisUtils = rsgislib.RSGISPyUtils()
    wktStr = rsgisUtils.getWKTFromEPSGCode(4326)
    #print(wktStr)

    lat_arr = hdfImg.select('Latitude')[:]
    long_arr = hdfImg.select('Longitude')[:]    
    sel_dataset_arr = hdfImg.select('Optical_Depth_Land_And_Ocean')[:]

    gcplst = creatGCPs(lat_arr, long_arr)


    y_size = lat_arr.shape[0]
    x_size = lat_arr.shape[1]

    min_lat = numpy.min(lat_arr)
    max_lat = numpy.max(lat_arr)
    min_lon = numpy.min(long_arr)
    max_lon = numpy.max(long_arr)

    lat_res = (max_lat-min_lat)/float(y_size)
    lon_res = (max_lon-min_lon)/float(x_size)

    driver = gdal.GetDriverByName( "KEA" )
    metadata = driver.GetMetadata()
    dst_ds = driver.Create( outputFile, x_size, y_size, 1, gdal.GDT_Float32 )
    dst_ds.GetRasterBand(1).WriteArray(sel_dataset_arr)

    gcp_list = []
    for gcp_arr in gcplst:
        gcp = gdal.GCP(int(gcp_arr[2]), int(gcp_arr[3]), int(0), gcp_arr[0], gcp_arr[1])
        gcp_list.append(gcp)

    dst_ds.SetGCPs(gcp_list, wktStr)

    dst_ds = None



if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # Define the argument for specifying the input file.
    parser.add_argument("-i", "--input", type=str, required=True,  help="Specify the input image file.")
    # Define the argument for specifying the output file.
    parser.add_argument("-o", "--output", type=str, required=True, help="Specify the output image file.")
    args = parser.parse_args()


    run(args.input, args.output)

Answer 1

You can use the nargs='+' option, and since you're going to have only have one required argument, I'd recommend that you don't use --input as an option, but simply run the script as script_name.py input_file1 input_file2 input_file3 ... :

import os.path
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('input', nargs='+', help="Specify the input image file.")
    args = parser.parse_args()
    for filename in args.input:
        root, ext = os.path.splitext(filename)
        run(filename, ''.join((root, '_Processed', ext)))

Answer 2

From the argparse docs here , you can simply add a nargs='*' to the argument definitions. However, be sure to give the input and output files in the same order...

Also, you can use the pathlib.Path object, which is now standard in Python >=3.4, to play with file names.

So with an added from pathlib import Path at the top, the last part of your code becomes:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # Define the argument for specifying the input file.
    parser.add_argument("-i", "--input", nargs='*', type=str, required=True,  help="Specify the input image file.")

    args = parser.parse_args()

    for input in args.input:
        output = Path(input).stem + '_Processed.hdf'
        run(input, output)

Here, args.input is now a list of strings, so we iterate on it. The .stem attribute returns the file name without any extensions, I find it cleaner than something like input[:-4] , which only works for specific extension lengths...

This works well with glob patterns in a standard linux shell (I don't know for other cases).

Ex. calling python this_script.py -i Image_* , processes every file with filenames beginning with "Image_".

Argparse - Making a list of inputs and outputs

Question

2 answers

solution1
0 2018-08-16 13:02:13

solution2
0 ACCPTED 2018-08-16 13:15:32

Argparse - Making a list of inputs and outputs

Question

2 answers

solution1 0 2018-08-16 13:02:13

solution2 0 ACCPTED 2018-08-16 13:15:32

solution1
0 2018-08-16 13:02:13

solution2
0 ACCPTED 2018-08-16 13:15:32