简体   繁体   English

ArangoDB:同时查询多个字段以进行部分匹配

[英]ArangoDB: Querying multiple fields at the same time for partial match

I have a database containing product information (SKU, model number, descriptions, etc) and I'd like to have a relatively quick search function where a user can just type in a few letters or a word from any of the the text fields and then get a list of products that contain that phrase in any of those fields. 我有一个包含产品信息(SKU,型号,描述等)的数据库,并且我想拥有一个相对快速的搜索功能,用户可以从任何文本字段中输入几个字母或单词,然后然后在任何这些字段中获取包含该词组的产品列表。

The number of items in the database will probably not be more than 100,000. 数据库中的项目数可能不会超过100,000。

What would be the easiest way to accomplish this, without creating complex queries? 在不创建复杂查询的情况下最简单的方法是什么?

It sounds like you're looking for an autocomplete. 听起来您正在寻找自动完成功能。 There are numerous ways to do this. 有很多方法可以做到这一点。

Indexing 索引

No matter the solution you choose, you'll want to put some indices on your data. 无论选择哪种解决方案,都需要在数据上放置一些索引。 I recommend adding a skiplist to everything you're going to be searching, and an additional fulltext index on any long-form text (such as product description). 我建议在要搜索的所有内容中添加一个跳转列表,并在任何长格式文本(例如产品说明)上添加一个全文索引。 String comparison uses skiplists, while only a FULLTEXT search will leverage a fulltext index. 字符串比较使用跳过列表,而只有FULLTEXT搜索将利用全文索引。

Querying 查询

You have some choices here. 您在这里有一些选择。

LIKE 喜欢

https://docs.arangodb.com/3.1/AQL/Functions/String.html#like https://docs.arangodb.com/3.1/AQL/Functions/String.html#like

You could run your search something like: 您可以运行类似以下内容的搜索:

for product in warehouse
    filter like(product.model, @searchTerm, true) or
           like(product.sku, @searchTerm, true)
    return product

Advantage: simple query syntax, multiple attributes in one search, supports substrings, can search the middle of a body of text. 优点:查询语法简单,一次搜索中具有多个属性,支持子字符串,可以搜索文本中间。

Disadvantage: relatively slow. 缺点:比较慢。

Fulltext 全文

This is a lot more complex for querying, but is very responsive, and is the approach my application uses for its autocomplete. 这对于查询来说要复杂得多,但响应速度很快,是我的应用程序用于自动完成的方法。

let sku = (for result in fulltext("warehouse", "sku", "prefix:@seacrhTerm")
           return {sku: result.sku, model: result.model, description: result.description}
let model = (for result in fulltext("warehouse", "model", "prefix:@searchTerm")
           return {sku: result.sku, model: result.model, description: result.description}
let description = (for result in fulltext("warehouse", "description", "prefix:@searchTerm")
           return {sku: result.sku, model: result.model, description: result.description}

let resultsMatch = union(sku,model,description)

return resultsMatch

Advantage: Very fast, extremely responsive, can handle very long bodies of text with ease, searches anywhere in a text body. 优点:非常快,响应迅速,可以轻松处理很长的文本正文,可以在文本正文中的任何位置进行搜索。

Disadvantage: Complex query structure as you need one variable for every attribute you're searching, a fulltext index created on each of those attributes you're searching, and a union at the end. 缺点:复杂的查询结构,因为您要为每个要搜索的属性都需要一个变量,在要搜索的每个属性上创建一个全文索引,最后需要并集。 You may need to do a union of the unioned results depending on how advanced your search needs to be. 您可能需要对合并结果进行合并,具体取决于搜索需要达到的高级程度。 Doesn't support substring searching. 不支持子字符串搜索。

Raw string comparison 原始字符串比较

Simply create a query that filters for results to be greater than or equal to your search term, but less than your search term with the last letter incremented by 1. Example is in the link under the Foxx portion of my answer. 只需创建一个查询即可对结果进行过滤,以使结果大于或等于您的搜索词,但小于您的搜索词,且最后一个字母加1。示例在我答案的Foxx部分下面的链接中。 This leverages skiplists. 这利用了跳过列表。

Advantage: Very fast as long as the field is not tremendously long. 优点:只要场不是很长,就非常快。 Extremely easy to implement. 极其容易实现。

Disadvantage: Doesn't support substring searches. 缺点:不支持子字符串搜索。 Only searches the first part of a string. 仅搜索字符串的第一部分。 Ie you must know the beginning of the field you're searching. 也就是说,您必须知道要搜索的字段的开头。

This will work very well for quickly searching something like a model number where your users will probably know the beginning of it, but poorly for something like a description in which your users are probably searching for words somewhere in the middle of a body of text. 对于快速搜索模型编号之类的内容(您的用户可能会知道它的开头),这将非常有用,但是对于诸如描述这样的内容(用户可能在文本主体中的某个地方搜索单词)的情况,则效果不佳。

Foxx 福克斯

Jan's little Cookbook example is a good place to start: Jan的Cookbook小示例是一个不错的起点:

https://docs.arangodb.com/cookbook/UseCases/PopulatingAnAutocompleteTextbox.html https://docs.arangodb.com/cookbook/UseCases/PopulatingAnAutocompleteTextbox.html

I would recommend abstracting whatever you do into a Foxx service. 我建议将您所做的任何事情抽象为Foxx服务。 It is especially liberating if you need to dynamically build up AQL queries in database, in case you have a huge number of fields and collections to search and you need to generate a Fulltext search dynamically. 如果您需要在数据库中动态建立AQL查询,尤其是要解放,以防您有大量要搜索的字段和集合并且需要动态生成全文本搜索。

Bottom line 底线

Experiment and see which of these works best for you. 实验一下,看看哪种最适合您。 My best guess is that you will find the Fulltext solution the best if you need to search on product descriptions. 我最好的猜测是,如果您需要搜索产品说明,将会找到最佳的全文解决方案。 If you expect your users to always search the first few letters of a field, just use the comparison with a skiplist as it is very very fast. 如果您希望用户始终搜索字段的前几个字母,则将比较与跳过列表一起使用,因为它非常快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM