Objective: Return all URLs beginning with "https://mywebsite.domain.com/as/product/4/"
Given:
https://mywebsite.domain.com/as/product/1/production
https://mywebsite.domain.com/as/product/2/items
https://mywebsite.domain.com/as/product/1/affordability
https://mywebsite.domain.com/as/product/3/summary
https://mywebsite.domain.com/as/product/4/schedule
https://mywebsite.domain.com/as/product/4/resources/summary
Query 1:
WHERE CONTAINS (URL, 'https://mywebsite.domain.com/as/product/4')
Result:
All records returned
Query 2 (Added "*" after reading MSDN article )
WHERE CONTAINS (URL, '"https://mywebsite.domain.com/as/product/4*"')
Result:
No records returned
Any assistance would be greatly appreciated.
You can use CONTAINS
with a LIKE
subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS (URL, '"https://mywebsite.domain.com/as/product/4/"')
) AS S1
WHERE S1.URL LIKE 'https://mywebsite.domain.com/as/product/4/%'
This way, the SLOW LIKE
operator query will be run against a smaller set of records
EDIT1: (if WHERE CONTAINS (URL, '"https://mywebsite.domain.com/as/product/4/"')
is not filtering Values)
After a lot of searches. the problem is in /
. The forward-slash isn't contained in the Noise Words file, but I guess it's classed as a delimiter or Word breaker and therefore isn't searchable.
Read these Topics:
EDIT2:
I found one suggested solution that is
/
is considered as an english wordbreaker You may change It from Registry
HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Microsoft SQL Server\\<InstanceRoot>\\MSSearch\\Language\\eng
and HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Microsoft SQL Server\\<InstanceRoot>\\MSSearch\\Language\\enu
Sql server consider https://mywebsite.domain.com/as/product/4
as one word.
Note: above both path i have taken by assuming that you are using English language as word breaker.
Read more about Word Breaker in This MSDN Topic
Use the Like
operator:
WHERE URL LIKE 'https://mywebsite.domain.com/as/product/4%'
The %
is a wildcard. This should return all records that start with a pattern match up to the first wildcard %
.
Provided you always search start of the string this will ensure optimizer can use index. I assume URL is VARCHAR
Declare @p varchar(500) ='https://mywebsite.domain.com/as/product/4'
Declare @maxChar char(1);
select @maxChar = max(ch)
from (
select top(256) ch = char(row_number() over(order by (select null)) - 1)
from sys.all_objects) t;
select @maxChar;
-- ..
WHERE URL > @p AND URL < @p + @maxChar
When comparing strings, Sql server adds trailing spaces to the shorter one. See https://support.microsoft.com/en-us/kb/316626 . According to http://www.ietf.org/rfc/rfc1738.txt , http://www.ietf.org/rfc/rfc1738.txt all allowed URL symbols are greater than space. So the search parameter, 'https://mywebsite.domain.com/as/product/4'
for example, will be less than any URL which starts with this parameter and exceeds parameter length.
For similar problems I'm used to two solutions, depending on your needs, mainly on performaces or resources or concurrency.. etc etc..
The LIKE
operator could be your best friend also with very big tables.
Indexing
First of all, you need to index your url column, working with 20+ millions records it is not easy task, indexing it could cost you 1.5 - 2.0 Gb of disk space, but you will get your query in NO TIME (milliseconds)
With the index on the column to search, LIKE FixedPattern+%
is performed with an index seek, you cannot improve it any further .
First solution:
CREATE NONCLUSTERED INDEX [IX_URL] ON [url_table] ([url]);
DECLARE @Domain VARCHAR(100) = 'https://mywebsite.domain.com/'
DECLARE @Path VARCHAR(100) = 'as/product/'
DECLARE @Product VARCHAR(20) = '4'
DECLARE @LikeAll VARCHAR(100) = @Domain + @Path + @Product + '/%'
SELECT url
FROM url_table
WHERE url LIKE @LikeAll
Second solution
The second option is a bit tricky but very effective.
You said protocol and domain of url are fixed and you need to search for something after.
The following is a technique, you can fine tune it to match your needs.
The idea is to add a virtual (computed) column to your url table and then to add an index on it.
This will greatly reduce index dimensions and improve query performances at the cost of a very little overhead of computing in insert/update
ALTER TABLE url_table ADD path AS (SUBSTRING(url, 30, 4000));
CREATE NONCLUSTERED INDEX [IX_PATH] ON [url_table] ([path]);
DECLARE @Domain VARCHAR(100) = 'https://mywebsite.domain.com/'
DECLARE @Path VARCHAR(100) = 'as/product/'
DECLARE @Product VARCHAR(20) = '4'
DECLARE @LikeMid VARCHAR(100) = @Path + @Product + '/%'
select @Domain + _path -- pay attention!!
FROM url_table
WHERE url LIKE @SrcAll
Please take note, we are selecting @Domain + _path instead of url, to avoid table access and work only on index data.
If you need other columns in url_table your best option is
declare @l table (id int primary key)
insert into @l
select id
from url_table
where _path like @LikeMid
select url
from url_table
where id in (select id from @l)
very fast
Third solution
This is a variant of second one.
In your example data I see the path contains /product/
followed by a number and I'm assuming it as the product number. Maybe you can consider the following
ALTER TABLE url_table ADD _product AS (cast(substring(url,nullif(CHARINDEX('/product/',url,29)+9,9), CHARINDEX('/',url,nullif(CHARINDEX('/product/',url,29)+9,9))-nullif(CHARINDEX('/product/',url,29)+9,9)) as bigint));
CREATE NONCLUSTERED INDEX [IX_PRODUCT] ON [url] ([_product]);
select id, url
from url_table
where _product = 4
This will produce a computed column with product number of type integer, the index will be only 500Mb and queries on integers will be super fast.
Also the overhead to select all columns from url_table is very very little so you can SELECT *
with almost no performances issues.
PS You can drop your FullText index and save space and resources..
SELECT * FROM myTable WHERE URL LIKE 'https://mywebsite.domain.com/as/product/4%'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.