简体   繁体   中英

Can a Postgres C-language function reference a stateful variable C-side (possibly in a thread safe manner)?

I am currently attempting to compute random values drawn from a beta distribution QUICKLY. I have a slow solution in PLV8, but I know that randomkit/mtrand from numpy ( https://github.com/numpy/numpy/tree/master/numpy/random/mtrand ) is bloody quick and entirely extractable. In light of this, I have shamelessly taken the files randomkit.h , randomkit.c , ditributions.h , and ditributions.c from that repository and created a C-language function with the following code:

#include "postgres.h"
#include "fmgr.h"
#include "randomkit.h"
#include "distributions.h"

/*
-- DIRECTORY is path our library resides in.
CREATE FUNCTION pg_random_from_beta(DOUBLE PRECISION, DOUBLE PRECISION)  
    RETURNS DOUBLE PRECISION
    AS 'DIRECTORY/funcs', 'pg_random_from_beta'
    LANGUAGE C STRICT
*/

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Datum pg_random_from_beta(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_random_from_beta);
Datum
pg_random_from_beta(PG_FUNCTION_ARGS)
{
    rk_state    rkstate;
    rk_randomseed(&rkstate);

    float8  alpha   = PG_GETARG_FLOAT8(0);
    float8  beta    = PG_GETARG_FLOAT8(1);
    PG_RETURN_FLOAT8(rk_beta(&rkstate, alpha, beta));
}

This compiles and works nicely, except that randomly generating a new rkstate on each function call really degrades from the performance of the function. Is there a way to create the rkstate stateful variable once and reference it subsequent times? I envision that this could be done using static variables, but that is unlikely to be thread safe. Is there a better solution?

You can indeed store state across functions. The PG_FUNCTION_ARGS macros unpacks to FunctionCallInfoData* fcinfo (see src/include/fmgr.h ), where the FmgrInfo *flinfo member has a member void *fn_extra .

You can cache things between calls by MemoryContextAlloc ing memory for a struct and stashing a pointer to the struct in that space. You must use the context specified by fn_mcxt , do not use palloc .

See src/backend/utils/fmgr/README and existing usage of fn_extra per git grep . It's a bit confusing, since fn_extra is documented as being for use by function call handlers but in practice is often used by function implementations themselves. I've posted a patch to amend the docs .

For example, see the type caching done in src/backend/utils/adt/arrayfuncs.c .

Thread safety is irrelevant because PostgreSQL uses a multi-processing copy-on-write model using fork() and POSIX shared memory. Each backend is single threaded. It's possible to share between backends, but you have to use the extension shared memory mechanism, and the data being shared must make sense when copied between processes. In your case it's not likely to make any sense to do this, and you should just initialize rk_state on first use in your backend.

Because PostgreSQL is single threaded, you can also share state between functions (where fn_extra won't help you) simply by declaring and accessing a global variable . You must be careful about the memory context you use if you allocate memory for this variable using palloc or MemoryContextAlloc .

Well, as it turns out, postgres's internal random() function can be accessed to get random longs. I was able to replace rk_random() with random() , negating the need for the stateful variable. Unfortunately, this hasn't worked out as well as I hoped, the speed of the function call is many orders of magnitude slower than when I had timed the randomkit library myself, so thats a bit of a bummer.

I don't have answer to my original question, so I if anybody wants to address that, I will mark it the right answer!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM