简体   繁体   English

如何使用sqlite在数据库中搜索作为查询子字符串的字段

[英]How to search the database for a field which is a substring of the query by using sqlite

Problem description 问题描述

I want to search for the query = Angela in a database from a table called Variations. 我想从名为Variations的表中搜索查询= Angela The problem is that the database does not Angela . 问题是该数据库没有Angela It contains Angel . 它包含天使 As you can see the a is missing. 如您所见, a丢失了。


Searching procedure 搜索程序

The table that I want to query is the following: 我要查询的表如下:

"CREATE TABLE IF NOT EXISTS VARIATIONS 
    (ID        INTEGER PRIMARY KEY NOT NULL, 
     ID_ENTITE INTEGER, 
     NAME      TEXT, 
     TYPE      TEXT, 
     LANGUAGE  TEXT);"

To search for the query I am using fts4 because it is faster than LIKE% especially if I have a big database with more than 10 millions rows. 为了搜索查询,我使用fts4因为它比LIKE%快,尤其是当我有一个拥有超过一千万行的大型数据库时。 I cannot also use the equality since i am looking for substrings. 我也不能使用相等性,因为我正在寻找子字符串。

  1. I create a virtual table create virtual table variation_virtual using fts4(ID, ID_ENTITE, NAME, TYPE, LANGUAGE); 我创建一个虚拟表, create virtual table variation_virtual using fts4(ID, ID_ENTITE, NAME, TYPE, LANGUAGE);
  2. Filled the virtual table with VARIATIONS insert into variation_virtual select * from VARIATIONS; 用VARIATIONS insert into variation_virtual select * from VARIATIONS;虚拟表,然后insert into variation_virtual select * from VARIATIONS;
  3. The selection query is represented as follow: 选择查询表示如下:

     SELECT ID_ENTITE, NAME FROM variation_virtual WHERE NAME MATCH "Angela"; 

Question

What am I missing in the query. 我在查询中缺少什么。 What I am doing is the opposite of when we want to check if a query is a subtring of a string in a table. 我正在做的事情与我们要检查查询是否是表中字符串的子字符串相反。

You can't use fts4 for this. 您不能为此使用fts4。 From the documentation : 文档中

SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* 0.03 seconds */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */

Of course, the two queries above are not entirely equivalent. 当然,以上两个查询并不完全等效。 For example the LIKE query matches rows that contain terms such as "linuxophobe" or "EnterpriseLinux" (as it happens, the Enron E-Mail Dataset does not actually contain any such terms), whereas the MATCH query on the FTS3 table selects only those rows that contain "linux" as a discrete token. 例如,LIKE查询匹配包含诸如“ linuxophobe”或“ EnterpriseLinux”之类的词的行(实际上,Enron电子邮件数据集实际上不包含任何此类词),而FTS3表上的MATCH查询仅选择那些词。包含“ linux”作为离散标记的行。 Both searches are case-insensitive. 两种搜索都不区分大小写。

So your query will only match strings that have 'Angela' as a word (at least that is how I interpret 'discrete token'). 因此,您的查询将仅匹配单词为“ Angela”的字符串(至少这就是我解释“离散令牌”的方式)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM