简体   繁体   English

QuickCheck:如何使用穷举检查器来防止忘记 sum 类型的构造函数

[英]QuickCheck: How to use exhaustiveness checker to prevent forgotten constructors of a sum type

I have a Haskell data type like我有一个 Haskell 数据类型,比如

data Mytype
  = C1
  | C2 Char
  | C3 Int String

If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning ( exhaustiveness check ).如果我caseMytype而忘记处理的情形之一的,GHC给我一个警告(全面性检查)。

I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:我现在想编写一个MyTypes Arbitrary实例来生成MyTypes例如:

instance Arbitrary Mytype where
  arbitrary = do
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.这样做的问题是我可以为Mytype添加一个新的替代方案而忘记更新 Arbitrary 实例,因此我的测试不会测试该替代方案。

I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.我想找到一种使用 GHC 的详尽检查器来提醒我任意实例中被遗忘的案例的方法。

The best I've come up with is我想出的最好的是

arbitrary = do
  x <- elements [C1, C2 undefined, C3 undefined undefined]
  case x of
    C1     -> C1
    C2 _   -> C2 <$> arbitrary
    C3 _ _ -> C3 <$> arbitrary <*> someCustomGen

But it doesn't really feel elegant.但是感觉真的不是很优雅。

I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.我直觉上觉得没有 100% 干净的解决方案,但会欣赏任何能减少忘记这种情况的机会的东西——尤其是在代码和测试分开的大项目中。

I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8 .我用 TemplateHaskell 实现了一个解决方案,你可以在https://gist.github.com/nh2/d982e2ca4280a03364a8找到一个原型。 With this you can write:有了这个,你可以写:

instance Arbitrary Mytype where
  arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
      [ pure C1
      , C2 <$> arbitrary
      , C3 <$> arbitrary <*> arbitrary
      ]
    |])

It works like this: You give it a type name (like ''Mytype ) and an expression (in my case a list of arbitrary style Gen s).它是这样工作的:你给它一个类型名称(比如''Mytype )和一个表达式(在我的例子中是一个arbitrary样式Gen的列表)。 It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once.它获取该类型名称的所有构造函数的列表,并至少检查一次表达式是否包含所有这些构造函数。 If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.如果您刚刚添加了一个构造函数但忘记将其添加到 Arbitrary 实例中,此函数将在编译时警告您。

This is how it's implemented with TH:这是用 TH 实现的方式:

exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
  tyInfo <- reify tyName
  let conNames = case tyInfo of
        TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
        _ -> fail "exhaustivenessCheck: Can only handle simple data declarations"

  list <- qList
  case list of
    input@(ListE l) -> do
      -- We could be more specific by searching for `ConE`s in `l`
      let cons = toListOf tinplate l :: [Name]
      case filter (`notElem` cons) conNames of
        [] -> return input
        missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
    _ -> fail "exhaustivenessCheck: argument must be a list"

I'm using GHC.Generics to easily traverse the syntax tree of the Exp : With toListOf tinplate exp :: [Name] (from lens ) I can easily find all Name s in the whole exp .我正在使用GHC.Generics轻松遍历Exp的语法树:使用toListOf tinplate exp :: [Name] (来自lens )我可以轻松找到整个exp中的所有Name

I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp .我很惊讶Language.Haskell.TH中的类型没有Generic实例,并且(使用当前的 GHC 7.8)也不需要IntegerWord8 - 这些的Generic实例因为它们出现在Exp So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).所以我将它们添加为孤立实例(对于大多数情况, StandaloneDeriving ,但对于像Integer这样的原始类型,我必须复制粘贴实例,因为Int拥有它们)。

The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.该解决方案并不完美,因为它不像case那样使用穷举检查器,但正如我们所同意的,在保持 DRY 时这是不可能的,而这个 TH 解决方案是 DRY。

One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.一种可能的改进/替代方法是编写一个 TH 函数,该函数一次对整个模块中的所有 Arbitrary 实例进行检查,而不是在每个 Arbitrary 实例内调用exhaustivenessCheck

You want to ensure that your code behaves in a particular way;您希望确保您的代码以特定方式运行; the simplest way to check the behaviour of code is to test it.检查代码行为的最简单方法是对其进行测试。

In this case, the desired behaviour is that each constructor gets reasonable coverage in tests.在这种情况下,期望的行为是每个构造函数在测试中获得合理的覆盖率。 We can check that with a simple test:我们可以通过一个简单的测试来检查:

allCons xs = length xs > 100 ==> length constructors == 3
             where constructors = nubBy eqCons xs
                   eqCons  C1       C1      = True
                   eqCons  C1       _       = False
                   eqCons (C2 _)   (C2 _)   = True
                   eqCons (C2 _)    _       = False
                   eqCons (C3 _ _) (C3 _ _) = True
                   eqCons (C3 _ _)  _       = False

This is pretty naive, but it's a good first shot.这很幼稚,但这是一个很好的第一枪。 Its advantages:它的优点:

  • eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want如果添加了新的构造函数, eqCons将触发详尽警告,这正是您想要的
  • It checks that your instance is handling all constructors, which is what you want它检查您的实例是否正在处理所有构造函数,这正是您想要的
  • It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)将检查所有构造实际上与一些有用的概率产生的(在这种情况下,至少1%)
  • It also checks that your instance is usable, eg.它还检查您的实例是可用的,如。 doesn't hang不挂

Its disadvantages:它的缺点:

  • Requires a large amount of test data, in order to filter out those with length > 100需要大量的测试数据,为了过滤掉那些长度> 100的
  • eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check eqCons非常冗长,因为一个eqCons _ _ = False会绕过穷举检查
  • Uses magic numbers 100 and 3使用幻数 100 和 3
  • Not very generic不是很一般

There are ways to improve this, eg.有办法改善这一点,例如。 we can compute the constructors using the Data.Data module:我们可以使用 Data.Data 模块计算构造函数:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = dataTypeConstrs (head xs)

This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.这失去了编译时详尽性检查,但只要我们定期测试并且我们的代码变得更加通用,它就是多余的。

If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:如果我们真的想要彻底检查,有几个地方我们可以把它硬塞回去:

allCons xs = sufficient ==> length constructors == consCount
             where sufficient   = length xs > 100 * consCount
                   constructors = length . nub . map toConstr $ xs
                   consCount    = length . dataTypeConstrs $ case head xs of
                                                                  x@(C1)     -> x
                                                                  x@(C2 _)   -> x
                                                                  x@(C3 _ _) -> x

Notice that we use consCount to eliminate the magic 3 completely.请注意,我们使用 consCount 来完全消除魔法3 The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!神奇的100 (它决定了构造函数所需的最低频率)现在随着 consCount 进行缩放,但这只是需要更多的测试数据!

We can solve that quite easily using a newtype:我们可以使用 newtype 很容易地解决这个问题:

consCount = length (dataTypeConstrs C1)

newtype MyTypeList = MTL [MyType] deriving (Eq,Show)

instance Arbitrary MyTypeList where
  arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
  shrink (MTL xs) = MTL (shrink <$> xs)

allCons (MTL xs) = length constructors == consCount
                   where constructors = length . nub . map toConstr $ xs

We can put a simple exhaustiveness check in there somewhere if we like, eg.如果我们愿意,我们可以在某个地方进行简单的详尽检查,例如。

instance Arbitrary MyTypeList where
  arbitrary = do x <- arbitrary
                 MTL <$> vectorOf (100 * consCount) getT
              where getT = do x <- arbitrary
                              return $ case x of
                                            C1     -> x
                                            C2 _   -> x
                                            C3 _ _ -> x
  shrink (MTL xs) = MTL (shrink <$> xs)

Here I exploit an unused variable _x .在这里,我利用了一个未使用的变量_x This is not really more elegant than your solution, though.不过,这并不比您的解决方案更优雅。

instance Arbitrary Mytype where
  arbitrary = do
    let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
    n <- choose (1, 3 :: Int)
    case n of
      1 -> C1
      2 -> C2 <$> arbitrary
      3 -> C3 <$> arbitrary <*> someCustomGen

Of course, one has to keep the last case coherent with the dummy definition of _x , so it is not completely DRY.当然,必须使最后case_x的虚拟定义保持一致,因此它不是完全 DRY。

Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones.或者,可以利用 Template Haskell 构建编译时断言,检查Data.Data.dataTypeOf中的构造Data.Data.dataTypeOf是否符合预期。 This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.这个断言必须与Arbitrary实例保持一致,所以这也不是完全 DRY。

If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where).如果您不需要自定义生成器,我相信可以利用Data.Data通过 Template Haskell 生成Arbitrary实例(我想我看到一些代码就是这样做的,但我不记得在哪里)。 In this way, there's no chance the instance can miss a constructor.这样,实例就不可能错过构造函数。

Here is a solution using the generic-random library:这是使用generic-random库的解决方案:

{-# language DeriveGeneric #-}
{-# language TypeOperators #-}

import Generic.Random
import GHC.Generics
import Test.QuickCheck

data Mytype
  = C1
  | C2 Char
  | C3 Int String
  deriving Generic

instance Arbitrary Mytype where
  arbitrary = genericArbitraryG customGens uniform
    where
      customGens :: Gen String :+ ()
      customGens = someCustomGen :+ ()

someCustomGen :: Gen String
someCustomGen = undefined

genericArbitraryG takes care of generating each constructor of MyType . genericArbitraryG负责生成MyType每个构造函数。 In this case we use uniform to get a uniform distribution of constructors.在这种情况下,我们使用uniform来获得构造函数的均匀分布。 With customGens we define that each String field in Mytype is generated with someCustomGen .随着customGens我们定义每个String在现场Mytype与产生someCustomGen

See Generic.Random.Tutorial for more examples.有关更多示例,请参阅Generic.Random.Tutorial

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM