简体   繁体   中英

Hive: performance slowing down when using lower function

I need to search for keywords, which is case insensitive. To do that, I'm using the below queries. Logic wise okay, but performance drastically going down.

Table info

item_tbl: 558991075
keywords: 2000
SELECT itemname from items i
left join keywords k ON i.id = k.item_id AND lower(i.itemname) LIKE CONCAT('%', lower(k.value), '%')
WHERE  l.item_id is null

I there a way to improve this query performance?

What is likely taking the most time is the like . If you are going to regularly use this query and it's not a one time thing you should try and precompute it. Or you should try to make this a straight forward join.

Is there a delimiter you can use to reduce the amount of items that you compare with like . You basically have to compare every record to every other record when you use like .

  • split i.itemname into 'words' something like explode(split(i.itemname, ' ')) as words
  • join on 'words' matching lower(k.value) = lower(words)

This would enable the power of a join to send the data to the right reducer and reduce the amount of comparisons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM