简体   繁体   English

如何在TVar上添加终结器

[英]How to add a finalizer on a TVar

Background 背景

In response to a question , I built and uploaded a bounded-tchan (wouldn't have been right for me to upload jnb's version ). 为了回答一个问题 ,我构建并上传了bound-tchan (我不适合上传jnb的版本 )。 If the name isn't enough, a bounded-tchan (BTChan) is an STM channel that has a maximum capacity (writes block if the channel is at capacity). 如果名称还不够,那么bounded-tchan(BTChan)是具有最大容量的STM通道(如果通道已满,则写入块)。

Recently, I've received a request to add a dup feature like in the regular TChan's . 最近,我收到了添加像常规TChan一样的dup功能的请求。 And thus begins the problem. 从而开始了问题。

How the BTChan looks BTChan的外观

A simplified (and actually non-functional) view of BTChan is below. 下面是BTChan的简化视图(实际上是无效的)。

data BTChan a = BTChan
    { max :: Int
    , count :: TVar Int
    , channel :: TVar [(Int, a)]
    , nrDups  :: TVar Int
    }

Every time you write to the channel you include the number of dups ( nrDups ) in the tuple - this is an 'individual element counter' which indicates how many readers have gotten this element. 每次向通道写入内容时,都在元组中包含dups( nrDups )的数量-这是一个“单个元素计数器”,表示有多少读者获得了该元素。

Every reader will decrement the counter for the element it reads then move it's read-pointer to then next element in the list. 每个读取器都会减少其读取元素的计数器,然后将其读取指针移至列表中的下一个元素。 If the reader decrements the counter to zero then the value of count is decremented to properly reflect available capacity on the channel. 如果读取器将计数器递减为零,则递减count以正确反映通道上的可用容量。

To be clear on the desired semantics: A channel capacity indicates the maximum number of elements queued in the channel. 要在期望的语义上明确:通道容量表示在通道中排队的最大元素数。 Any given element is queued until a reader of each dup has received the element. 任何给定元素都会排队,直到每个dup的阅读器收到该元素为止。 No elements should remain queued for a GCed dup (this is the main problem). 任何元素都不应排队等待GCed重复(这是主要问题)。

For example, let there be three dups of a channel (c1, c2, c3) with capacity of 2, where 2 items were written into the channel then all items were read out of c1 and c2 . 例如,假设容量为2的通道(c1,c2,c3)有3个复制段,其中2个项被写入通道,然后从c1c2中读出所有项。 The channel is still full (0 remaining capacity) because c3 hasn't consumed its copies. 通道仍已满 (剩余容量为0),因为c3尚未消耗其副本。 At any point in time if all references to c3 are dropped (so c3 is GCed) then the capacity should be freed (restored to 2 in this case). 在任何时间点,如果所有对c3引用都被删除(因此c3被GC了),则应释放容量(在这种情况下恢复为2)。

Here's the issue: let's say I have the following code 这是问题所在:假设我有以下代码

c <- newBTChan 1
_ <- dupBTChan c  -- This represents what would probably be a pathological bug or terminated reader
writeBTChan c "hello"
_ <- readBTChan c

Causing the BTChan to look like: 使BTChan看起来像:

BTChan 1 (TVar 0) (TVar []) (TVar 1)             -->   -- newBTChan
BTChan 1 (TVar 0) (TVar []) (TVar 2)             -->   -- dupBTChan
BTChan 1 (TVar 1) (TVar [(2, "hello")]) (TVar 2) -->   -- readBTChan c
BTChan 1 (TVar 1) (TVar [(1, "hello")]) (TVar 2)       -- OH NO!

Notice at the end the read count for "hello" is still 1 ? 请注意,最后"hello"的读取计数仍为1 That means the message is not considered gone (even though it will get GCed in the real implementation) and our count will never decrement. 这意味着该消息不会被认为已经消失(即使它在实际实现中会被垃圾回收),并且我们的count永远不会减少。 Because the channel is at capacity (1 element maximum) the writers will always block. 由于通道处于最大容量(最多1个元素),因此写入器将始终处于阻塞状态。

I want a finalizer created each time dupBTChan is called. 我希望每次调用dupBTChan都创建一个dupBTChan器。 When a dupped (or original) channel is collected all elements remaining to be read on that channel will get the per-element count decremented, also the nrDups variable will be decremented. 当收集了一个已钝化(或原始)的通道时,该通道上所有剩余要读取的元素都将使每个元素的计数减少,并且nrDups变量也将减少。 As a result, future writes will have the correct count (a count that doesn't reserve space for variables not-read by GCed channels). 结果,将来的写入将具有正确的count (该count不会为GCed通道未读取的变量保留空间)。

Solution 1 - Manual Resource Management (what I want to avoid) 解决方案1-手动资源管理(我要避免的事情)

JNB's bounded-tchan actually has manual resource management for this reason. 因此,JNB的bound-tchan实际上具有手动资源管理。 See the cancelBTChan . 请参见cancelBTChan I'm going for something harder for the user to get wrong (not that manual management isn't the right way to go in many cases). 我要为用户提供更难犯错的东西(不是在很多情况下手动管理不是正确的方法)。

Solution 2 - Use exceptions by blocking on TVars (GHC can't do this how I want) 解决方案2-通过阻止TVar来使用异常(GHC无法按照我的意愿执行此操作)

EDIT this solution, and solution 3 which is just a spin-off, does not work! 编辑此解决方案,而仅是附带的解决方案3不起作用! Due to bug 5055 (WONTFIX) the GHC compiler sends exceptions to both blocked threads, even though one is sufficient (which is theoretically determinable, but not practical with the GHC GC). 由于存在错误5055 (WONTFIX),GHC编译器会将异常发送到两个阻塞的线程,即使一个线程就足够了(理论上是可以确定的,但对于GHC GC来说并不实际)。

If all the ways to get a BTChan are IO, we can forkIO a thread that reads/retries on an extra (dummy) TVar field unique to the given BTChan . 如果获取BTChan所有方法都是IO,则我们可以forkIO一个线程,该线程在给定BTChan唯一的额外(虚拟)TVar字段上读取/重试。 The new thread will catch an exception when all other references to the TVar are dropped, so it will know when to decrement the nrDups and individual element counters. 当所有其他对TVar的引用都被删除时,新线程将捕获异常,因此它将知道何时减少nrDups和单个元素计数器。 This should work but forces all my users to use IO to get their BTChan s: 这应该可以工作,但会强制所有用户使用IO来获取其BTChan

data BTChan = BTChan { ... as before ..., dummyTV :: TVar () }

dupBTChan :: BTChan a -> IO (BTChan a)
dupBTChan c = do
       ... as before ...
       d <- newTVarIO ()
       let chan = BTChan ... d
       forkIO $ watchChan chan
       return chan

watchBTChan :: BTChan a -> IO ()
watchBTChan b = do
    catch (atomically (readTVar (dummyTV b) >> retry)) $ \e -> do
    case fromException e of
        BlockedIndefinitelyOnSTM -> atomically $ do -- the BTChan must have gotten collected
            ls <- readTVar (channel b)
            writeTVar (channel b) (map (\(a,b) -> (a-1,b)) ls)
            readTVar (nrDup b) >>= writeTVar (nrDup b) . (-1)
        _ -> watchBTChan b

EDIT: Yes, this is a poor mans finalizer and I don't have any particular reason to avoid using addFinalizer . 编辑:是的,这是一个穷人的终结器,我没有任何特殊的理由要避免使用addFinalizer That would be the same solution, still forcing use of IO afaict. 那将是相同的解决方案,仍然迫使使用IO afaict。

Solution 3: A cleaner API than solution 2, but GHC still doesn't support it 解决方案3:比解决方案2更干净的API,但是GHC仍然不支持它

Users start a manager thread by calling initBTChanCollector , which will monitor a set of these dummy TVars (from solution 2) and do the needed clean-up. 用户通过调用initBTChanCollector启动管理器线程,该线程将监视一组这些虚拟TVar(来自解决方案2)并进行所需的清理。 Basically, it shoves the IO into another thread that knows what to do via a global ( unsafePerformIO ed) TVar . 基本上,它将IO推到另一个线程中,该线程知道通过全局( unsafePerformIO ed) TVar做什么。 Things work basically like solution 2, but the creation of BTChan's can still be STM. 事情基本上像解决方案2一样工作,但是BTChan的创建仍然可以是STM。 Failure to run initBTChanCollector would result in an ever-growing list (space leak) of tasks as the process runs. 运行initBTChanCollector失败会导致进程运行时任务列表(空间泄漏)不断增加。

Solution 4: Never allow discarding BTChan s 解决方案4:禁止丢弃BTChan

This is akin to ignoring the problem. 这类似于忽略该问题。 If the user never drops a dupped BTChan then the issue disappears. 如果用户从不丢弃重复的BTChan则问题将消失。

Solution 5 I see ezyang's answer (totally valid and appreciated), but really would like to keep the current API just with a 'dup' function. 解决方案5我看到了ezyang的答案(完全有效并受到赞赏),但实际上我想仅使用“ dup”功能保留当前的API。

** Solution 6** Please tell me there's a better option. **解决方案6 **请告诉我还有更好的选择。

EDIT: I implemented solution 3 (totally untested alpha release) and handled the potential space leak by making the global itself a BTChan - that chan should probably have a capacity of 1 so forgetting to run init shows up really quick, but that's a minor change. 编辑:我实现了解决方案3 (完全未经测试的alpha版本),并通过使全局自身成为BTChan来处理了潜在的空间泄漏-该chan的容量应该为1,所以忘记运行init确实显示得很快,但这只是一个小小的变化。 This works in GHCi (7.0.3) but that seems to be incidental. 这在GHCi(7.0.3)中有效,但这似乎是偶然的。 GHC throws exceptions to both blocked threads (the valid one reading the BTChan and the watching thread) so my if you are blocked reading a BTChan when another thread discards it's reference then you die. GHC对两个被阻塞的线程(读取BTChan和监视线程的有效线程)都抛出异常,因此,如果另一个线程丢弃它的引用时被阻塞读取BTChan,那我就死了。

Here is another solution: require all accesses to the the bounded channel duplicate to be bracketed by a function that releases its resources on exit (by an exception or normally). 这是另一种解决方案:要求对有界通道重复项的所有访问都由一个函数括起来,该函数在退出时释放其资源(通过异常或通常)。 You can use a monad with a rank-2 runner to prevent duplicated channels from leaking out. 您可以将Monad与2级赛跑者一起使用,以防止重复的频道泄漏出去。 It's still manual, but the type system makes it a lot harder to do naughty things. 它仍然是手动的,但是类型系统使调皮的事情变得更加困难。

You really don't want to rely on true IO finalizers, because GHC gives no guarantees about when a finalizer may be run: for all you know it may wait until the end of the program before running the finalizer, which means you're deadlocked until then. 您真的不想依赖真正的IO终结器,因为GHC无法保证何时可以运行终结器:就您所知,它可能要等到程序结束后才能运行终结器,这意味着您陷入了僵局。直到那时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM