简体   繁体   English

Go中处理包裹分配的最佳实践

[英]Best practice for dealing with package allocation in Go

I'm writing a package which makes heavy use of buffers internally for temporary storage. 我正在编写一个程序包,该程序包在内部大量使用缓冲区进行临时存储。 I have a single global (but not exported) byte slice which I start with 1024 elements and grow by doubling as needed. 我有一个全局(但未导出)的字节片,它以1024个元素开头,并根据需要加倍来增长。

However, it's very possible that a user of my package would use it in such a way that caused a large buffer to be allocated, but then stop using the package, thus wasting a large amount of allocated heap space, and I would have no way of knowing whether to free the buffer (or, since this is Go, let it be GC'd). 但是,我的程序包的用户很有可能会以导致分配大量缓冲区的方式使用它,然后停止使用该程序包,从而浪费了大量已分配的堆空间,而我将无法知道是否释放缓冲区(或者因为这是Go,所以将其设为GC)。

I've thought of three possible solutions, none of which is ideal. 我想到了三种可能的解决方案,但都不是理想的解决方案。 My question is: are any of these solutions, or maybe ones I haven't thought of, standard practice in situations like this? 我的问题是:在这样的情况下,这些解决方案中的任何一种,或者也许是我从未想到的解决方案? Is there any standard practice? 有没有标准做法? Any other ideas? 还有其他想法吗?

  1. Screw it. 算了。

Oh well. 那好吧。 It's too hard to deal with this, and leaving allocated memory lying around isn't so bad. 这太难对付这一点,并留下分配的内存躺在附近并没有那么糟糕。

The problem with this approach is obvious: it doesn't solve the problem. 这种方法的问题很明显:它不能解决问题。

  1. Exported "I'm done" or "Shrink internal memory usage" function. 导出了“我已完成”或“缩小内部内存使用量”功能。

Export a function which the user can call (and calling it intelligently is obviously up to them) which will free the internal storage used by the package. 导出用户可以调用的函数(并且智能地调用它显然取决于他们),这将释放软件包使用的内部存储。

The problem with this approach is twofold. 这种方法的问题是双重的。 First, it makes for a more complex, less clean interface to the user. 首先,它为用户提供了更复杂,更简洁的界面。 Second, it may not be possible or practical for the user to know when calling such a function is wise, so it may be useless anyway. 其次,用户可能无法或不实际知道何时调用此类函数是明智的,因此无论如何它可能都是无用的。

  1. Run a goroutine which frees the buffer after a certain period of the package going unused, or which shrinks the buffer (perhaps halving the length) whenever its size hasn't been increased in a while. 运行一个goroutine程序,该程序在未使用软件包的一段时间后释放缓冲区,或者在一段时间内不增加缓冲区大小时缩小缓冲区(也许将长度减半)。

The problem with this approach is primarily that it puts unnecessary strain on the scheduler. 这种方法的问题主要是它给调度程序带来了不必要的负担。 Obviously a single goroutine isn't so bad, but if this were accepted practice, it wouldn't scale well if every package you imported were doing this under the hood. 显然,单个goroutine并没有那么糟糕,但是如果这种做法被接受,那么如果您导入的每个软件包都在后台执行此操作,那么扩展性就不会很好。 Also, if you have a time-sensitive application, you may not want code running when you're not aware of it (that is, you may assume that the package isn't doing any work when its functions are not being called - a reasonable assumption, I'd say). 另外,如果您有一个对时间敏感的应用程序,则可能不希望在不知道它的情况下运行代码(也就是说,您可以假定在不调用其功能时该程序包没有做任何工作-我会说合理的假设)。

So... any ideas? 所以...有什么想法吗?

NOTE: You can see the existing project here (the relevant code is only a few tens of lines). 注意:您可以在这里看到现有项目(相关代码只有几十行)。

A common approach to this is letting the client pass an existing []byte (or whatever) as an argument to some call/function/method. 一种常见的方法是让客户端将现有的[] byte(或其他)作为参数传递给某些调用/函数/方法。 For example: 例如:

// The returned slice may be a sub-slice of dst if dst was large enough
// to hold the entire encoded block. Otherwise, a newly allocated slice
// will be returned. It is valid to pass a nil dst.
func Foo(dst []byte, whatever Bar) (ret []byte, err error)

( Example ) 示例

Another approach is to get a new []byte from a, for example cache and/or for example pool (if you prefer the later name for that concept) and rely on clients to return used buffers to such "recycle-bin". 另一种方法是从(例如) 缓存和/或例如 (如果您更喜欢该概念的后继名称)中获取新的[]字节,并依靠客户端将使用过的缓冲区返回到此类“回收站”。

BTW: You're doing it right by thinking about this. 顺便说一句:您通过考虑这一点来做对了。 Where it's possible to reasonably reuse []byte buffers, there's a potential for lowering the GC load and thus making your program better performing. 如果可以合理地重用[] byte缓冲区,则有可能降低GC负载,从而使程序性能更好。 Sometimes the difference can be critical. 有时差异可能很关键。

I have a single global (but not exported) byte slice which I start with 1024 elements and grow by doubling as needed. 我有一个全局(但未导出)的字节片,它以1024个元素开头,并根据需要加倍来增长。

And there's your problem. 还有你的问题。 You shouldn't have a global like this in your package. 您的程序包中不应包含这样的全局变量。

Generally the best approach is to have an exported struct with attached functions. 通常,最好的方法是使用带有附加功能的导出结构。 The buffer should reside in this struct unexported. 缓冲区应驻留在未导出的此结构中。 That way the user can instantiate it and let the garbage collector clean it up when they let go of it. 这样,用户可以实例化它,并让垃圾收集器在释放它时对其进行清理。

You also want to avoid requiring globals like this as it can hamper unit tests. 您还希望避免需要像这样的全局变量,因为它会妨碍单元测试。 A unit test should be able to instantiate the exported struct, as the user can, and do it each time for every test. 单元测试应该能够像用户一样实例化导出的结构,并在每次测试时都执行一次。

Also depending on what kind of buffer you need, bytes.Buffer may be useful as it already provides io.Reader and io.Writer functions. 另外,根据所需的缓冲区类型, bytes.Buffer可能会很有用,因为它已经提供了io.Readerio.Writer函数。 bytes.Buffer also automatically grows and shrinks its buffer. bytes.Buffer也会自动增加和缩小其缓冲区。 In buffer.go you'll see various calls to b.Truncate(0) that does the shrinking with the comment "reset to recover space". buffer.go中,您将看到对b.Truncate(0)各种调用, b.Truncate(0)调用使用注释“重置以恢复空间”进行收缩。

You could reslice your buffer at the end of every operation. 您可以在每次操作结束时重新分配缓冲区。

buffer = buffer[:0]

Then your function extendAndSliceBuffer would have the original backing array most likely available if it needs to grow. 然后,如果需要扩展,函数extendAndSliceBuffer将具有最可能可用的原始后备数组。 If not, you would suffer a new allocation, which you might get anyway when you do extendAndSliceBuffer . 如果没有,您将遭受一个新的分配,当您执行extendAndSliceBuffer时可能extendAndSliceBuffer获得该extendAndSliceBuffer

Overall, I think a cleaner solution is to do like @jnml said and let the users pass their own buffer if they care about performance. 总的来说,我认为一个更干净的解决方案是像@jnml所说的那样做,并且让用户在关心性能时传递自己的缓冲区。 If they don't care about performance, then you should not use a global var and simply allocate the buffer as you need and let it go when it gets out of scope. 如果他们不关心性能,则不应使用全局变量,而应根据需要简单分配缓冲区,并在缓冲区超出范围时将其释放。

It's generally really really bad form to write Go code that is not thread-safe. 编写不是线程安全的Go代码通常是非常糟糕的形式。 If two different goroutines call functions that modify the buffer at the same time, who knows what state the buffer will be in when they finish? 如果两个不同的goroutine调用了同时修改缓冲区的函数,谁知道缓冲区完成后缓冲区将处于什么状态? Just let the user provide a scratch-space buffer if they decide that the allocation performance is a bottleneck. 如果用户认为分配性能是瓶颈,只需让他们提供暂存空间缓冲区即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM