For RediSearch, is it better to create a single index or multiple indexes?

Question

Am building an Index using RediSearch in a multi-tenant application that has got:

150,000 tenants
Each tenant has on average 3,500 customers
Each customer has 10 fields that will be added to the index
All of the fields are TextFields .

Question is, what would be best practice (Performance, Memory/Storage, Flexibility) in such a case?

Should I create one customer_index with a tenant_code field to help identify which data belongs to which tenant or should I create a tenant specific index?

From my current experience and understanding, tenant-specific-index would mean many indexes but with less data in them and it would also give me the flexibility to drop and recreate an index for a specific tenant?

In Python, the code would be as below:

Single Customer Index

client = Client(`customer_index`)
client.create_index(
            [
                TextField('tenant_code'), TextField('last_name'), TextField('first_name'),
                TextField('other_name'), 
            ]
        )

Tenant Specific Customer Index

client = Client(`tenant_code_customer_index`)
client.create_index(
            [
                TextField('last_name'), TextField('first_name'), TextField('other_name'), 
            ]
        )

Answer 1

Because each tenant only has 3500 customers (relatively little), you'd be better off memory wise using a larger index. With so few records, the resource overhead for each index would likely exceed the size of the index itself. This will also increase the number of keys in redis itself, as a new Redis key is created for each indexed term per index. So if you have ~2000 unique terms in each DB, you will end up with 300M Redis keys (2k * 150k). In contrast, using a single index will leave you with only 2k keys.

Performance-wise, there shouldn't be any difference, either, because the tenant code is itself an inverted index, so it will be unlikely that search would need to sift through more records in a larger index.

For deletion you can simply gather a list of IDs which match a criteria, eg " FT.SEARCH idx @tenant:yourcode " and call FT.DEL on each of those records individually. I am assuming that this is not an operation that is being performed every five seconds, so you should be fin there.

Note that using 150k indexes right now is probably not even possible because a dedicated indexing thread is created for each index (though an option to have indexing performed on a single thread will be available in future releases).

For RediSearch, is it better to create a single index or multiple indexes?

Question

1 answers

solution1
2 ACCPTED 2019-01-31 15:48:45

For RediSearch, is it better to create a single index or multiple indexes?

Question

1 answers

solution1 2 ACCPTED 2019-01-31 15:48:45

solution1
2 ACCPTED 2019-01-31 15:48:45