I am trying to analyze the efficiency of synonyms I have saved on our webshop. I have a list of about 5000 synonyms and want to vlookup them on a list of 1.000.000 queries with the help of Excel. The problem is, that on each "synonym cell" I might have multiple synonyms separated by a blank space. These synonyms I want to find in a list of query strings. Eventually whenever there is a match, I wish to find the product attached to that synonym in the referring cell "VLOOKUP" and count them all together to see how many sales I have gotten with the help of synonyms.
Hopefully this explanation is not too complicated and you can help me out finding each synonym in the search queries. If you have a better idea on how to do this procedure more efficiently, that would be even better. :-)
Here is some sample data that explains well, what I am going for: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing
Feel free to edit it and thanks a lot! Nes
In order to vlookup a synonum (found in column B of worksheet "synonyms", maybye you have to restruct worksheet "synonyms" like this:
| synonym | product |
| -------- | -------|
| icecream | cake1 |
| sweets | cake1 |
After that you need to split query column in sevral columns (one word per column) and vlookup each of this words in the restructered synonyms worksheet
With that much data, and using Excel, I'd suggest using Power Query (available in Windows Excel 2010+ and O365).
IgnoreCase
and set an appropriate Threshold
to allow for single vs plural words, and minor misspellings in the queries.you may need to tweak things like the column splitter and the threshold depending on your real data. Also I've only allowed for a single synonym and pdroduct. The column splitter can be rewritten to handle any number of columns, and I'll look at that if I have time later today
I tried to comment the M-code to explain things
Note the table names in lines 4 and 18 of the code. You may need to change these (or, if reading them from an external source, change that line entirely).
M Code
paste into Advanced Editor in PQ
let
//Read in Query Table and convert to single column of words in query
Source = Excel.CurrentWorkbook(){[Name="tblQuery"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"query", type text}, {"bought", type text}}),
//add an index column for eventual reconstruction
#"Add Index" = Table.AddIndexColumn(#"Changed Type1","Index",0,1),
//may need more splits depending on real data
splitIt = Table.SplitColumn(#"Add Index", "query", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),
{"query.1", "query.2", "query.3"}),
#"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(splitIt, {"Index", "bought"}, "Attribute", "Value"),
queryTbl = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute"}),
//Read in synonym table
//unpivot to convert to two column table
Source2 = Excel.CurrentWorkbook(){[Name="tblSyno"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source2,{{"product", type text}, {"query-synonym", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "query-synonym", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"query-synonym.1", "query-synonym.2", "query-synonym.3"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"query-synonym.1", type text}, {"query-synonym.2", type text}, {"query-synonym.3", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"product"}, "Attribute", "Value"),
synoTbl = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//combine tables based on synonyms
//combTbl = Table.NestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter),
combTbl = Table.FuzzyNestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter,
[IgnoreCase=true, Threshold=0.9]),
//extract the synonym
#"Added Custom" = Table.AddColumn(combTbl, "Synonym", each try Table.Column([Joined],"Value"){0}
otherwise null),
#"Added Custom4" = Table.AddColumn(#"Added Custom", "Attached Product", each try Table.Column([Joined],"product"){0}
otherwise null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom4",{"Joined"}),
//Recombine by the Index column to recreate the query
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"Group", each _, type table [bought=nullable text, Index=number, Value=text, Joined=table, Synonym=nullable text]}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Index"}),
#"Added Custom1" = Table.AddColumn(#"Removed Columns1", "query", each Table.Column([Group],"Value")),
#"Extracted Values" = Table.TransformColumns(#"Added Custom1",
{"query", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
//extract the "bought" column from the group table
//if there might be more than one product in
//the "bought" column, need to change this
#"Added Custom2" = Table.AddColumn(#"Extracted Values", "bought", each
List.Distinct(Table.Column([Group],"bought")){0}),
//extract the Matched Synonym column
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Matched Synonym", each List.RemoveNulls(Table.Column([Group],"Synonym")){0}),
//extract the Attached Product column
#"Added Custom5" = Table.AddColumn(#"Added Custom3", "Attached Product", each List.RemoveNulls(Table.Column([Group],"Attached Product")){0}),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom5",{"Group"})
in
#"Removed Columns2"
Synonym Table
Results
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.