简体   繁体   中英

Parse command line with state in Python

Some program command lines have state that applies to the arguments following the state-setting option (the '--' argument to rm and touch is an example; ffmpeg is infamous for stateful arg parsing). For example:

readmulticsv --cols 1,2,4 file1.csv --date_format "%Y-%m-%d" file2.csv --cols 4,3,9 file3.csv file4.csv

Here, it will pull columns 1, 2, and 4 from file1.csv and file2.csv and then pull columns 4, 3, and 9, in that order, from file3.csv and file4.csv. Further, it will start interpreting dates (in the first column of the --cols argument) in file1.csv with a default "%m/%d/%Y" format, but switch to "%Y-%m-%d" for the remaining files. What I want is a list of lists, where each element list has the file name and the values of the relevant state variables:

[["file1.csv", "1,2,4", "%m/%d/%Y"],
 ["file2.csv", "1,2,4", "%Y-%m-%d"],
...
]

Implementing this is straightforward if you walk sys.argv manually.

Is there a way to do this with argparse? My program uses argparse for many other options and its nice help feature, and the whole code is written around its Namespace object. I could use parse_known_args() and leave the rest for a "walk" approach, but that excludes --cols, --date_format, and the files from the help and Namespace. I've tried figuring out an Action(), but I'm not sure how to proceed there. The docs for setting that up aren't super clear to me and I don't see how to access the existing state.

Is there an alternative arg parser that can do it all (help, defaults, a Namespace)?

My application is a program to calculate stock basis, gain, and growth by reading CSV transaction files, where the investments have transferred between brokers with different file formats and format changes over several decades. I could write a converter for each of the old formats, but I'd rather write a single program that works directly from the source data.

Thanks,

--jh--

One of the big things that argparse adds, compared to earlier optparse and getopt is the ability to handle positionals . It uses a re like syntax and pattern matching to allocate strings (from the sys.argv list) to positionals and to optionals (flagged) arguments.

The basic parsing routine is to alternately parse positionals and an optional .

With:

--cols 1,2,4 file1.csv --date_format "%Y-%m-%d" file2.csv --cols 4,3,9 file3.csv file4.csv

I can imagine defining a

parser.add_argument('--cols', nargs='+', action='append')
parser.add_argument('--date_format', nargs='+', action='append')

resulting in

args.cols = [['1,2,4','file1.csv'], ['4,3,9', 'file3.csv', 'file4.csv']]
args.date_format = [["%Y-%m-%d", 'file2.csv']]

argparse does not retain an info on how the cols and date options are interleaved.

I was tempted to collect the 'file' names in positionals, but there isn't a way of ordering successive positionals between each optional.

In a recent previous SO I suggested prepopulating args with lists, eg

 argparse.Namespace(cols=[[]], date_format=[["%m/%d/%Y"]])

and changing the cols action to replace the last empty list. An new date_format would update both cols and date_format to start a new "state".

Python using diferent options multiple times with argparse

In the default Action subclasses, the __call__ writes the new value(s) to the attribute (with setattr ), overwriting the default or what ever was written before. append subclass, fetches the attribute ( getattr ), appends to it and writes it back. The default classes only work with their own dest .

The only "state" that the Action has access to is the namespace . But that's probably enough it you design the custom action subclasses to fetch and save the appropriate attributes. Custom actions can even write and read attributes that aren't spelled out in the add_argument calls. (In the documentation, set_defaults is used to add function attribute for subparsers.)

Another customization approach is to define a new Namespace class. The default one is simple, with just a means displaying itself. Where possible argparse uses getattr , hasattr and setattr to interact with the namespace, so it imposes minimal constraints on that class.

So between type functions, action subclasses, namespace classes, and formatter there's a lot of room for customizing argparse . But you do need to study the argparse.py code. And recognize that there's little you can do to change the basic parsing sequence.

Processing sys.argv before parsing is another tool, as is post processing the args namespace.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM