Better performance in Lucene Or SQL

Question

I am using PostgreSQL database.

Having a table named metadatavalue with structure is as below:

metadatavalue_id    integer  Primary Key Auto Increment
metadta_field_id    integer  Foreign Key
text_Value      varchar
text_lang       varchar
place           integer

When anything gets submitted or added an item with almost 25 metadata fields is created.

The metadatavalue table already contains around

One Hundred Fifty Thousand(150000) records.

I am implementing an auto complete feature for a field let say "Author" which is stored as metadata_field_id in the table.

When I query the table on PgSQL prompt, it takes almost 1 or 2 seconds to return the result.

QUERY:

SELECT metadatavalue.text_value AS author, count(metadatavalue.text_value) AS count
   FROM metadatavalue
  WHERE (metadatavalue.metadata_field_id IN ( SELECT metadatafieldregistry.metadata_field_id
           FROM metadatafieldregistry
          WHERE metadatavalue.text_value LIKE 'Pra%' AND metadatafieldregistry.metadata_schema_id = 1 AND metadatafieldregistry.element::text = 'contributor'::text))
  GROUP BY metadatavalue.text_value;

As its for auto complete the query might run 4-5 times when users enters value.

So, I am thinking to implement LUCENE based search.

In which,At First creating an index from back end and then on each new item creation running a thread to index the new item.

I want to know that whether Apache Lucene would be better choice or SQL can be optimized.

EDIT: There is another table which contains metadata fields and it is used as Foreign Key (metadatafieldregistry.metadata_field_id) in metadatavalue table for the value.

Answer 1

通过对这样一个小数据集进行前缀查询，只要正确索引所需的列，Solr和PostgreSQL都应该执行得非常好。

Answer 2

I would say any database will handle at least a million rows gracefully if proper indexing is done, there is no reason for you to get into Lucene or Solr which will introduce you to new tasks like synchronization of your indexes with most current state of the DB.

Also, Lucene or Solr are very great for free text searching. This means if you search for "Bob Marley" on your Lucene "documents" then you will get all the document which has "Bob Marley", "Marley Bob" or only "Bob" and only "Marley" or even "Bob...lot of text...Marley". So using Lucene also depends on what kind of use cases you are trying to cover.

From the query you have shown I feel you will get good performance if you index metadatavalue.text_value metadatafieldregistry.metadata_schema_id and metadatafieldregistry.element columns. Also try converting your query to a join rather then an in query.

Thanks

Answer 3

You don't mention the schema of metadatafieldregistry table (in fact you say you have just one table, but your query uses two)

Look at the explain analyze output to see what the query plan is, and what is taking up the time to scan. Your subquery is correlated, which almost definitely isn't a good plan: in general, the schema smells of EAV. You may find a partial index helpful, to produce an index containing only those text values that you want to do a prefix search on (probably restricting metadata_schema_id and element )

Better performance in Lucene Or SQL

Question

3 answers

solution1
1 2012-07-30 20:40:30

solution2
1 ACCPTED 2012-09-04 08:42:56

solution3
1 2012-09-04 11:27:25

Better performance in Lucene Or SQL

Question

3 answers

solution1 1 2012-07-30 20:40:30

solution2 1 ACCPTED 2012-09-04 08:42:56

solution3 1 2012-09-04 11:27:25

solution1
1 2012-07-30 20:40:30

solution2
1 ACCPTED 2012-09-04 08:42:56

solution3
1 2012-09-04 11:27:25