I am currently attempting to compute random values drawn from a beta distribution QUICKLY. I have a slow solution in PLV8, but I know that randomkit/mtrand from numpy ( https://github.com/numpy/numpy/tree/master/numpy/random/mtrand ) is bloody quick and entirely extractable. In light of this, I have shamelessly taken the files randomkit.h , randomkit.c , ditributions.h , and ditributions.c from that repository and created a C-language function with the following code:
#include "postgres.h"
#include "fmgr.h"
#include "randomkit.h"
#include "distributions.h"
/*
-- DIRECTORY is path our library resides in.
CREATE FUNCTION pg_random_from_beta(DOUBLE PRECISION, DOUBLE PRECISION)
RETURNS DOUBLE PRECISION
AS 'DIRECTORY/funcs', 'pg_random_from_beta'
LANGUAGE C STRICT
*/
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
Datum pg_random_from_beta(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(pg_random_from_beta);
Datum
pg_random_from_beta(PG_FUNCTION_ARGS)
{
rk_state rkstate;
rk_randomseed(&rkstate);
float8 alpha = PG_GETARG_FLOAT8(0);
float8 beta = PG_GETARG_FLOAT8(1);
PG_RETURN_FLOAT8(rk_beta(&rkstate, alpha, beta));
}
This compiles and works nicely, except that randomly generating a new rkstate
on each function call really degrades from the performance of the function. Is there a way to create the rkstate
stateful variable once and reference it subsequent times? I envision that this could be done using static
variables, but that is unlikely to be thread safe. Is there a better solution?
You can indeed store state across functions. The PG_FUNCTION_ARGS
macros unpacks to FunctionCallInfoData* fcinfo
(see src/include/fmgr.h
), where the FmgrInfo *flinfo
member has a member void *fn_extra
.
You can cache things between calls by MemoryContextAlloc
ing memory for a struct and stashing a pointer to the struct in that space. You must use the context specified by fn_mcxt
, do not use palloc
.
See src/backend/utils/fmgr/README
and existing usage of fn_extra
per git grep
. It's a bit confusing, since fn_extra
is documented as being for use by function call handlers but in practice is often used by function implementations themselves. I've posted a patch to amend the docs .
For example, see the type caching done in src/backend/utils/adt/arrayfuncs.c
.
Thread safety is irrelevant because PostgreSQL uses a multi-processing copy-on-write model using fork()
and POSIX shared memory. Each backend is single threaded. It's possible to share between backends, but you have to use the extension shared memory mechanism, and the data being shared must make sense when copied between processes. In your case it's not likely to make any sense to do this, and you should just initialize rk_state
on first use in your backend.
Because PostgreSQL is single threaded, you can also share state between functions (where fn_extra
won't help you) simply by declaring and accessing a global variable . You must be careful about the memory context you use if you allocate memory for this variable using palloc
or MemoryContextAlloc
.
Well, as it turns out, postgres's internal random()
function can be accessed to get random longs. I was able to replace rk_random()
with random()
, negating the need for the stateful variable. Unfortunately, this hasn't worked out as well as I hoped, the speed of the function call is many orders of magnitude slower than when I had timed the randomkit library myself, so thats a bit of a bummer.
I don't have answer to my original question, so I if anybody wants to address that, I will mark it the right answer!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.