简体   繁体   中英

Is there a performant way to search by a non-partitioned column in crateDB?

My team and I have been using crate for one of our projects over the passed few years. We have a table with hundreds of millions of records and performance is key.

As we've developed more and more features on this project, we've ran into interesting problem. We have a column on this table labeled 'persist_date' which is when the record actually got persisted into the table. These dates may not always align and we could have a start_date of 2021-06-21 with a persist_date of 2021-10-14.

All of our queries up this point have easily been able to add a partition against start_date. Now we are encountering a problem which requires us to use a non-partitioned column (persist_date) to query against.

As I understand it, crateDB is really performant but only when you query against 1 specific partition at a time. My question now is how would I go about creating a partition for this other date column without duplicated my data? Is there anything other than a partition that might help, like the way the table is clustered?

You could use both columns as partition values. eg

CREATE TABLE two_parted (a TEXT, b TEXT, val DOUBLE) PARTITIONED BY (a,b);

If either a or b are used in a selection, this would limit queries to shards that have either value. However this could lead to more shards, so you might want to partitions not on a daily, but weekly or monthly basis.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM