简体   繁体   English

如何在golang中正确使用judy array lib?

[英]How to use judy array lib properly in golang?

In golang, the way calling C library is different from what's used in other mainframe dynamic language like PHP / Python / Java because Golang has a different multitasking mechanism which is not OS thread based, so call c function may result in a context switching or thread switching as I understand. 在golang中,调用C库的方式与在其他大型机动态语言(如PHP / Python / Java)中使用的方式不同,因为Golang具有不同的多任务机制,该机制不是基于OS线程的,因此调用c函数可能会导致上下文切换或线程据我了解切换。 In my project I'm trying to use Judy Array in Golang (as a queue worker) to do some simple but large amount dict-related calculation like "select distinct", so 在我的项目中,我尝试使用Golang中的Judy Array(作为队列工作人员)来进行一些简单但与dict相关的大量计算,例如“选择不同”,因此

What's the best practice to involve such c lib (for relatively high density calculation) and minimalise the performance overhead introduced as much as possible? 涉及这种c lib(用于相对较高的密度计算)并尽可能减少引入的性能开销的最佳实践是什么?

Despite the title, the question here really has two parts: a generic one about golang and C-interfacing for efficiency, and a specific one about performant use of judy arrays. 尽管有标题,但这里的问题实际上包括两部分:关于golang和C接口以提高效率的通用部分,以及关于对judy数组的高性能使用的特定部分。

This thread seems to summarize the costs: https://groups.google.com/forum/#!topic/golang-nuts/RTtMsgZi88Q , so yeah its expensive compared to straight C, and you should try to minimize the crossover points from Go to C. 该线程似乎总结了成本: https : //groups.google.com/forum/#!topic /golang-nuts/ RTtMsgZi88Q ,所以是的,与直接C相比,它很昂贵,因此您应该尽量减少Go的交叉点到C。

Here's additional, judy array specific advice: I've used judy arrays before in C/C++ code. 这是另外有关judy数组的建议:我以前在C / C ++代码中使用过judy数组。 The library's interface is not intuitive in certain places. 在某些地方,图书馆的界面不直观。 And by default it uses a C-macro based API, which makes it tricky to get the interface usage correct because the compiler can't offer as much help as usual. 并且默认情况下,它使用基于C宏的API,这使正确使用接口变得很棘手,因为编译器无法像往常一样提供太多帮助。

What I recommend, therefore, is that you write your tests and benchmarks in C first, so you understand the API and its weird cases. 因此,我建议您首先使用C编写测试和基准,以便您了解API及其奇怪的情况。 Judy arrays when benchmarked for my application (vs C++ vector of strings) were 3x faster, so it can be worth it. 当为我的应用程序(相对于字符串的C ++矢量)进行基准测试时,Judy数组的速度提高了3倍,因此值得这么做。 But break the task into three phases. 但是将任务分为三个阶段。 First do what you want to do in C, and make sure it works as expected in your own C code. 首先,执行您想在C中执行的操作,并确保它可以在您自己的C代码中按预期工作。 Then expand the basic C interface to handle batches of what you need done, so as to minimize the number of Go->C switches. 然后扩展基本的C接口以处理您需要完成的工作,以最大程度地减少Go-> C开关的数量。 Then bind your new C interface from Go. 然后从Go绑定新的C接口。

If you are starting the binding for the library from scratch, I'd start by using cgo in the most straight forward way possible, and then see whether the performance meets your requirements. 如果您是从头开始绑定库,那么我将以尽可能最直接的方式使用cgo ,然后查看性能是否满足您的要求。

If it doesn't, try minimising the number of C calls you make in commonly called spots. 如果不是这样,请尝试减少您在通常​​称为“斑点”中进行的C调用次数。 As you've already mentioned in the question, Go switches to a different stack when it makes a C call and this will affect the performance if you make lots of cgo calls to trivial functions. 正如您在问题中已经提到的那样,Go进行C调用时会切换到另一个堆栈,如果您对琐碎的函数进行大量cgo调用,这将影响性能。 So one way to improve performance is to reduce the total number of C calls. 因此,提高性能的一种方法是减少C调用的总数。

For example, if you need to call multiple C functions to implement one operation in your Go API, consider whether you could write a small shim C function that could combine those calls. 例如,如果您需要调用多个C函数以在Go API中实现一个操作,请考虑是否可以编写一个小的Shim C函数来组合这些调用。

If the API you're wrapping deals with a lot of strings, this can show up if you've got many calls like: 如果您要包装的API处理了很多字符串,那么在您进行许多调用(例如:

func foo(bar string) {
    cBar := C.CString(bar)
    defer C.free(unsafe.Pointer(cBar)
    C.foo(cBar)
}

Which is three C calls. 这是三个C调用。 If the API you're wrapping can deal with unterminated strings, one option here is to pass a pointer to the string to a wrapper, and use the GoString type defined in the generated _cgo_export.h . 如果您要包装的API可以处理未终止的字符串,则此处的一种选择是将指向该字符串的指针传递给包装器,并使用在生成的_cgo_export.h定义的GoString类型。 For example, on the Go side: 例如,在转到一侧:

func foo(bar string) {
    C.foo_wrapper(unsafe.Pointer(&bar))
}

And on the C side: 在C端:

#include "_cgo_export.h"
void foo_wrapper(void *ptr_to_string) {
    GoString *bar = ptr_to_string;
    foo_with_length(bar->p, bar->n);
}

As long as the library doesn't hold on to the string data past when foo_wrapper returns, this should be safe. 只要该库不保留foo_wrapper返回时过去的字符串数据,这应该是安全的。

There are probably some other optimisations that could help, but I'd strongly recommend keeping things simple initially and put your efforts into optimising the areas that matter. 可能还有其他一些优化可能会有所帮助,但是我强烈建议您一开始保持简单,并尽力优化重要领域。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM