简体   繁体   English

使用GHC编译非常大的常量

[英]Compiling very large constants with GHC

Today I asked GHC to compile an 8MB Haskell source file. 今天我要求GHC编译一个8MB的Haskell源文件。 GHC thought about it for about 6 minutes, swallowing almost 2GB of RAM, and then finally gave up with an out-of-memory error. GHC考虑了大约6分钟,吞下了近2GB的RAM,然后最终放弃了内存不足的错误。

[As an aside, I'm glad GHC had the good sense to abort rather than floor my whole PC.] [顺便说一句,我很高兴GHC很有意义中止而不是整个PC。]

Basically I've got a program that reads a text file, does some fancy parsing, builds a data structure and then uses show to dump this into a file. 基本上我有一个程序,它读取文本文件,进行一些奇特的解析,构建数据结构,然后使用show将其转储到文件中。 Rather than include the whole parser and the source data in my final application, I'd like to include the generated data as a compile-time constant. 我不想将整个解析器和源数据包含在我的最终应用程序中,而是希望将生成的数据包含为编译时常量。 By adding some extra stuff to the output from show , you can make it a valid Haskell module. 通过向show的输出添加一些额外的东西,你可以使它成为一个有效的Haskell模块。 But GHC apparently doesn't enjoy compiling multi-MB source files. 但GHC显然不喜欢编译多MB源文件。

(The weirdest part is, if you just read the data back, it actually doesn't take much time or memory. Strange, considering that both String I/O and read are supposedly very inefficient...) (最奇怪的部分是,如果你刚刚read回数据,它实际上并不需要花费太多时间或内存。奇怪的是,考虑到String I / O和read都被认为非常低效......)

I vaguely recall that other people have had trouble with getting GHC to compile huge files in the past. 我模糊地回忆起其他人在过去使用GHC编译大文件时遇到了麻烦。 FWIW, I tried using -O0 , which speeded up the crash but did not prevent it. FWIW,我尝试使用-O0 ,它加速了崩溃但没有阻止它。 So what is the best way to include large compile-time constants in a Haskell program? 那么在Haskell程序中包含大型编译时常的最佳方法什么?

(In my case, the constant is just a nested Data.Map with some interesting labels.) (在我的例子中,常量只是一个带有一些有趣标签的嵌套Data.Map 。)

Initially I thought GHC might just be unhappy at reading a module consisting of one line that's eight million characters long. 最初我认为GHC可能只是对阅读由一行长达800万个字符组成的模块感到不满。 (!!) Something to do with the layout rule or such. (!!)与布局规则等有关。 Or perhaps that the deeply-nested expressions upset it. 或者深深嵌套的表达式可能会扰乱它。 But I tried making each subexpression a top-level identifier, and that was no help. 但我尝试将每个子表达式作为顶级标识符,这没有任何帮助。 (Adding explicit type signatures to each one did appear to make the compiler slightly happier, however.) Is there anything else I might try to make the compiler's job simpler? (然而,为每个人添加显式类型签名确实会使编译器更加愉快。)还有什么我可能会尝试使编译器的工作更简单吗?

In the end, I was able to make the data-structure I'm actually trying to store much smaller. 最后,我能够使我实际上试图存储的数据结构小得多。 (Like, 300KB.) This made GHC far happier. (比如300KB。)这让GHC更加快乐。 (And the final application much faster.) But for future reference, I'd be interested to know what the best way to approach this is. (最后的应用程序要快得多。)但是为了将来参考,我有兴趣知道最好的方法是什么。

Your best bet is probably to compile a string representation of your value into the executable. 您最好的选择可能是将值的字符串表示形式编译到可执行文件中。 To do this in a clean manner, please refer to my answer in a previous question . 要以干净的方式执行此操作,请参阅上一个问题中的答案

To use it, simply store your expression in myExpression.exp and do read [litFile|myExpression.exp|] with the QuasiQuotes extension enabled, and the expression will be "stored as a string literal" in the executable. 要使用它,只需将表达式存储在myExpression.exp ,并在启用QuasiQuotes扩展时read [litFile|myExpression.exp|] ,并将表达式“存储为可执行文件中的字符串文字”。


I tried doing something similar for storing actual constants, but it fails for the same reason that embedding the value in a .hs file would. 我尝试做类似的事情来存储实际的常量,但它失败的原因与在.hs文件中嵌入值相同。 My attempt was: 我的尝试是:

Verbatim.hs : Verbatim.hs

module Verbatim where

import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.Meta.Parse

readExp :: String -> Q Exp
readExp = either fail return . parseExp

verbatim :: QuasiQuoter
verbatim = QuasiQuoter { quoteExp = readExp }

verbatimFile :: QuasiQuoter
verbatimFile = quoteFile verbatim

Test program: 测试程序:

{-# LANGUAGE QuasiQuotes #-}
module Main (main) where

import Verbatim

main :: IO ()
main = print [verbatimFile|test.exp|]

This program works for small test.exp files, but fails already at about 2MiB on this computer. 此程序适用于小型test.exp文件,但在此计算机上已经失败,大约2MiB。

There's a simple solution — your literal should have type ByteString . 有一个简单的解决方案 - 你的文字应该有ByteString类型。 See https://github.com/litherum/publicsuffixlist/pull/1 for details. 有关详细信息,请参阅https://github.com/litherum/publicsuffixlist/pull/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM