简体   繁体   中英

Extracting values from a deeply nested data structure in haskell

I've been trying to work out how to use the language-bash package to parse some simple bash scripts, and I've come across the following structure

Right (List [Statement (Last (Pipeline {timed = False, timedPosix = False, inverted = False, commands = [Command (SimpleCommand [Assign (Parameter "x" Nothing) Equals (RValue [Char '3'])] []) []]})) Sequential])

as a result of running

import Language.Bash.Parse
parse "" "x=3"

I could theoretically just pattern match the whole thing away, though I was wondering if there was a cleaner way of accessing the values of the Assign datatype ('x', (Char '3').

Is there anyway to cleanly access those values (or generally access values in a complex datastructure) without obsessive pattern matching?

Not really.

Here's the problem. You probably want to either handle an extremely limited set of possible Bash statements, in which case just writing out the patterns for specific List values will be faster than anything else you could possibly do.

Or, you want to handle a wide variety of Bash statements, in which case you can't really avoid the functional infrastructure to handle general List values. The same way you'd write an interpreter or compiler for any complex abstract syntax tree, you'll end up more or less writing a function for every (major) type and a case for every constructor.

The main Haskell tools for dealing with big, complex data structures are:

  • The "functional infrastructure" described above. That is, plain old functions defined using pattern matching, that process recursive data structures in a manner that mirrors the structures themselves. Don't underestimate this approach, It may seem like a lot of work, but it's likely to lead you to a correct program that handles all well-formed inputs, in a way that ad hoc approaches won't. Start with:

     {-# OPTIONS_GHC -Wall #-} data M =... some monad... data Result =... representation of what you want to extract from the script... processList:: List -> M Result... processStatement:: Statement -> M Result...

    and go from there. The -Wall is important to get the -Wincomplete-patterns warning so you don't miss any constructors.

  • Lenses, which provide a more ergonomic hierarchical syntax for referring to parts of deeply nested data structures. Since bash-language doesn't provide lenses for these structures, you'd need to write them yourself. They might allow you to write something along the lines of:

     lst ^. _Right.statements._head.andOr.pipeline.commands. _head._SimpleCommand.assignments._head.parameter.base

    to extract the "x" from "x=3" . Obviously, that doesn't help much, but lenses complement the "functional infrastructure" approach. The code to actually process all those types is often more easily expressed with lenses than pattern matching.

  • Generics, which allow you to generically access certain patterns within recursive data structures, while ignoring the "rest" of the data structure that you don't care about. The bash-language library includes deriving clauses for both Data and Generic generics. If it didn't, you could use StandaloneDeriving clauses to derive them. As an example, you can use Data generics to extract all Parameter s from a List , regardless of where those Parameter s appear, with something like:

     import Language.Bash.Parse import Language.Bash.Word import Data.Data import Data.Generics.Schemes import Data.Generics.Aliases parameters:: (Data a) => a -> [Parameter] parameters = everything (++) (mkQ [] (\p -> [p])) main = do let Right lst = parse "" "x=3; y=4; LANG=C echo $x $y" print $ parameters lst

    Here, this prints out a list of all parameters appearing in this shell "script", whether for purposes of assignment or substitution, so it includes: "x", "y", "LANG", and "x" and "y" again.

    This is a powerful tool, but it's likely to be applicable to only a few specific use-cases.

Ultimately, you'll probably have to take the view that you are writing a Bash interpreter (even if your interpreter does something besides "executing" the Bash script). Someone's been nice enough to supply a Bash parser to get the input source code into an AST, but the other half of the interpreter -- the actual interpretation itself -- still needs to be written by you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM