简体   繁体   中英

Multiply rows in group with SQLite

I have a query that returns the probability that a token has a certain classification.

token       class       probPaired
----------  ----------  ----------
potato      A           0.5
potato      B           0.5
potato      C           1.0
potato      D           0.5
time        A           0.5
time        B           1.0
time        C           0.5

I need to aggregate the probabilities of each class by multiplying them together.

-- Imaginary MUL operator
select class, MUL(probPaired) from myTable group by class;

class       probability
----------  ----------
A           0.25
B           0.5
C           0.5
D           0.5

How can I do this in SQLite? SQLite doesn't have features like LOG / EXP or variables - solutions mentioned in other questions .

In general, if SQLite can't do it you can write a custom function instead. The details depend on what programming language you're using, here it is in Perl using DBD::SQLite . Note that functions created in this way are not stored procedures, they exist for that connection and must be recreated each time you connect.

For an aggregate function, you have to create a class which handles the aggregation. MUL is pretty simple, just an object to store the product.

{
    package My::SQLite::MUL;

    sub new {
        my $class = shift;
        my $mul = 1;
        return bless \$mul, $class;
    }

    sub step {
        my $self = shift;
        my $num = shift;

        $$self *= $num;

        return;
    }

    sub finalize {
        my $self = shift;

        return $$self;
    }
}

Then you'd install that as the aggregate function MUL which takes a single argument and uses that class.

my $dbh = ...doesn't matter how the connection is made...

$dbh->sqlite_create_aggregate("MUL", 1, "My::SQLite::MUL");

And now you can use MUL in queries.

my $rows = $dbh->selectall_arrayref(
    "select class, MUL(probPaired) from myTable group by class"
);

Again, the details will differ with your particular language, but the basic idea will be the same.

This is significantly faster than fetching each row and taking the aggregate product.

You can calculate row numbers and then use a recursive cte for multiplication. Then get the max rnum (calculated row_number) value for each class which contains the final result of multiplication.

--Calculating row numbers
with rownums as (select t1.*,
                 (select count(*) from t t2 where t2.token<=t1.token and t1.class=t2.class) as rnum 
                 from t t1)
--Getting the max rnum for each class
,max_rownums as (select class,max(rnum) as max_rnum from rownums group by class)
--Recursive cte starts here
,cte(class,rnum,probPaired,running_mul) as
    (select class,rnum,probPaired,probPaired as running_mul from rownums where rnum=1
     union all
     select t.class,t.rnum,t.probPaired,c.running_mul*t.probPaired 
     from cte c
     join rownums t on t.class=c.class and t.rnum=c.rnum+1)
--Final value selection
select c.class,c.running_mul 
from cte c
join max_rownums m on m.max_rnum=c.rnum and m.class=c.class

SQL Fiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM