简体   繁体   中英

non-lawful Monoid instances for building up AST not considered harmful?

I've seen a data type defined like the following with a corresponding Monoid instance:

data Foo where
  FooEmpty :: String -> Foo
  FooAppend :: Foo -> Foo -> Foo

-- | Create a 'Foo' with a specific 'String'.
foo :: String -> Foo
foo = FooEmpty

instance Monoid Foo where
  mempty :: Foo
  mempty = FooEmpty ""

  mappend :: Foo -> Foo -> Foo
  mappend = FooAppend

You can find the full code in a gist on Github.

This is how Foo can be used:

exampleFoo :: Foo
exampleFoo =
  (foo "hello" <> foo " reallylongstringthatislong") <>
  (foo " world" <> mempty)

exampleFoo ends up as a tree that looks like this:

FooAppend
  (FooAppend
    (FooEmpty "hello")
    (FooEmpty " reallylongstringthatislong"))
  (FooAppend
    (FooEmpty " world")
    (FooEmpty ""))

Foo can be used to turn sequences of Monoid operations ( mempty and mappend ) into an AST. This AST can then be interpreted into some other Monoid .

For instance, here is a translation of Foo into a String that makes sure the string appends will happen optimally:

fooInterp :: Foo -> String
fooInterp = go ""
  where
    go :: String -> Foo -> String
    go accum (FooEmpty str) = str ++ accum
    go accum (FooAppend foo1 foo2) = go (go accum foo2) foo1

This is really nice. It is convenient that we can be sure String appends will happen in the right order. We don't have to worry about left-associated mappend s.

However, the one thing that worries me is that the Monoid instance for Foo is not a legal Monoid instance.

For instance, take the first Monoid law:

mappend mempty x = x

If we let x be FooEmpty "hello" , we get the following:

mappend mempty (FooEmpty "hello") = FooEmpty "hello"
mappend (FooEmpty "") (FooEmpty "hello") = FooEmpty "hello"  -- replace mempty with its def
FooAppend (FooEmpty "") (FooEmpty "hello") = FooEmpty "hello" -- replace mappend with its def

You can see that FooAppend (FooEmpty "") (FooEmpty "hello") does not equal FooEmpty "hello" . The other Monoid laws also don't hold for similar reasons.

Haskellers are usually against non-lawful instances. But I feel like this is a special case. We are just trying to build up a structure that can be interpreted into another Monoid . In the case of Foo , we can make sure that the Monoid laws hold for String in the fooInterp function.

  1. Is it ever okay to use these types of non-lawful instances to build up an AST?
  2. Are there any specific problems that need to be watched for when using these types of non-lawful instances?
  3. Is there an alternative way to write code that uses something like Foo ? Some way to enable interpretation of a monoidal structure instead of using mappend on a type directly?

Quoting this answer on a similar question :

You can think of it from this alternative point of view: the law (a <> b) <> c = a <> (b <> c) doesn't specify which equality should be used, ie what specific relation the = denotes. It is natural to think of it in terms of structural equality, but note that very few typeclass laws actually hold up to structural equality (eg try proving fmap id = id for [] as opposed to forall x . fmap id x = id x).

For example, it's mostly fine if you do not export the constructors of Foo , and only export functions that, from the point of view of users, behave as if Foo were a monoid. But most of the time it is possible to come up with a representation that's structurally a monoid, good enough in practice, though maybe not as general (below, you cannot reassociate arbitrarily after the fact, because interpretation is mixed with construction).

type Foo = Endo String

foo :: String -> Foo
foo s = Endo (s <>)

unFoo :: Foo -> String
unFoo (Endo f) = f ""

( Data.Monoid.Endo )


Here is another SO question where a non-structural structure ( Alternative ) is considered at first.

This will come up for most non-trivial data structures. The only exceptions I can think of off the top of my head are (some) trie-like structures.

  1. Balanced tree data structures allow multiple balancings of most values. This is true of AVL trees, red-black trees, B-trees, 2-3 finger trees, etc.

  2. Data structures designed around "rebuilding", such as Hood-Melville queues, allow variable amounts of duplication within structures representing most values.

  3. Data structures implementing efficient priority queues allow multiple arrangements of elements.

  4. Hash tables will arrange elements differently depending on when collisions occur.

None of these structures can be asymptotically as efficient without this flexibility. The flexibility, however, always breaks laws under the strictest interpretation. In Haskell, the only good way to deal with this is by using the module system to make sure no one can detect the problem. In experimental dependently typed languages, researchers have been working on things like observational type theory and homotopy type theory to find better ways to talk about "equality", but that research is pretty far from becoming practical.

Is it ever okay to use these types of non-lawful instances to build up an AST?

This is a matter of opinion. (I'm firmly in the 'never ok' camp.)

Are there any specific problems that need to be watched for when using these types of non-lawful instances?

  • cognitive burden placed on potential users and future maintainers
  • potential bugs because we use the type in a place that makes assumptions based on the broken law(s)

edit to answer questions in comments:

Would you be able to come up with specific examples of how it raises the cognitive burden on users?

Imagine how annoyed you would be if someone did this in C:

 // limit all while loops to 10 iterations
 #define while(exp) for(int i = 0; (exp) && i < 10; ++i)

Now we have to keep track of the scope of this pseudo-while definition and its implications. It's a non-Haskell example, but I think the principle is the same. We shouldn't expect the semantics of while to be different in a particular source file just like we shouldn't expect the semantics of Monoid to be different for a particular data type.

When we say something is an X, then it should be a X because people understand the semantics of X. The principle here is don't create exceptions to well understood concepts .

I think the point of using lawful abstractions (like monoid) in the first place is to alleviate the need for programmers to learn and remember a myriad of different semantics. Thus, every exception we create undermines this goal. In fact, it makes it worse; we have to remember the abstraction and on top of that remember all the exceptions. (As an aside, I admire but pity those who learned English as a second language.)

Or how it can lead to potential bugs?

some library:

-- instances of this class must have property P
class AbidesByP where
   ...

-- foo relies on the property P
foo :: AbidesByP a => a -> Result
foo a = ... 

my code:

data MyData = ...

-- note: AbidesByP's are suppose to have property P, but this one doesn't
instance AbidesByP MyData where
   ...

some other programmer (or me in a few months):

doSomethingWithMyData :: MyData -> SomeResult
doSomethingWithMyData x = let ...
                              ...
                              ...
                              r = foo x      -- potential bug
                              ...
                              ...
                           in ...

Is there an alternative way to write code that uses something like Foo?

I'd probably just use the constructor to contruct:

(foo "hello" `FooAppend` foo " reallylongstringthatislong") `FooAppend` (foo " world" `FooAppend` foo "")

or make an operator:

(<++>) = FooAppend

(foo "hello" <++> foo " reallylongstringthatislong") <++> (foo " world" <++> foo "")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM