简体   繁体   中英

Multi Tenancy in ClickHouse

A lot of people don't want to use ClickHouse just to do analytics for their company or project. They want to use it as the backbone for SaaS data/analytics projects. Which most of the time would require supporting semi-structured json data, which could result in creating a lot of columns for each user you have.

Now, some experinced ClickHouse users say less tables means more performance. So having a seperate table for each user is not an option.

Also, having the data of too many users into the same database will result in a very large number of columns, which some experiments say it could make CH unresponsive.

So what about something like 20 users per database having each user limited to 50 columns. But what if you got thousands of users? Should you create thousands of databases?

What is the best solution possible to this problem?

Note: In our case, data isolation is not an issue, we are solving it on the application level.

There is no difference between 1000 tables in a single database and 1000 databases with a single table.

There is ALMOST no difference between 1000 tables and a table with *1000 partitions partition by (tenant_id, .some_expression_from_datetime.)

The problem is in overhead from MergeTree and ReplicatedMergeTree Engines. And is in number of files you need to create / read (data locality problem, not related to files, will be the same without a filesystem).

If you have 1000 tenants, the only way is to use order by (tenant_id,..) + restrictions using row policies or on application level.

I have an experience with customers who have 700 Replicated tables -- it's constant straggle with the replication, need to adjust background pools, the problem with ZK (huge DB size, enormous network traffic between CH and ZK). Clickhouse starts for 4 hours because it needs to read metadata from all 1000000 parts. Partition pruning works slower because it iterates through all parts during query analysis for every query.

The source of the issue is the original design, they had about 3 tables in metrika i guess.

Check this for example https://github.com/ClickHouse/ClickHouse/issues/31919

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM