简体   繁体   中英

Categorization column based on text contained in 2 other columns within T-SQL query

I'm building a report in Power BI and could setup a Power Query custom column using Text.Contains to solve this problem but the M Code would be very long and I'd rather perform this upstream in the SQL query. I have very little SQL experience.

I'm working with website data from Adobe Analytics. We have our website URLS and web pages grouped into categorical segments based on the product/service the URL/webpage corresponds to. A segment is defined by a list of URL paths and/or web page names, sometimes 1 path/page, sometimes over 30.

My result needs to be the following table:

Page URL Path Page Name Page Category
varchar(255) varchar(255) varchar(255)

Page URL Path examples:

/careers/starting-your-career/scholarships.html
/services/technology/ecommerce.html

Corresponding Page Name Examples:

Career & Scholarships | Company Name
Digital Transformation | E-Commerce | Company Name

There are a total of 76 page categories/segments to define. This screenshot shows an example of some categories and their definition.

Can anyone help me get started in writing this query?

I tried using CONTAINS but I believe this only works within a WHERE statement and I don't think it can be scaled to the needed extent:

SELECT
    post_evar3 as 'Page URL Path',
    post_evar4 as 'Page Name',
    CASE 
        WHEN post_evar3 CONTAINS ('/services/assurance' or 'services/audit' or 'insights/financial-reporting') 
             AND (post_evar3 CONTAINS 'asc-842' OR post_evar4 CONTAINS 'asc 842') 
            THEN 'Audit Services'
        WHEN post_evar3 CONTAINS '/services/strategy-and-management-consulting' 
            THEN 'Business Stratgegy Operations'
        ELSE 'Other'
    END AS 'Page Category'
FROM
    Marketing.WebAnalytics.WebData
WHERE
    exclude_hit = 0
    AND hit_source = 1

I've read about Full-Text Search and Index solutions that are over my head in developing and I don't know that this method can be used within the Power BI SQL query environment. I've wondered if I need to declare the definition values into their own table, then join with the WebData.table, though defining using both Page URL Path AND Page Name for the same category throws me for a loop.

The M code for this kind of matching is not large, though execution time can can vary

let BufferedTable2=Table.Buffer(Table2),
Source = Table.AddColumn(Table1,"Match",(i)=>try Table.SelectRows( BufferedTable2, each  Text.Contains(i[Column1],[Match1], Comparer.OrdinalIgnoreCase) and Text.Contains(i[Column2],[Match2], Comparer.OrdinalIgnoreCase) ) [Return]{0} otherwise null, type text) 
in Source

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM