简体   繁体   中英

Complicated VLOOKUP on Excel

I am trying to analyze the efficiency of synonyms I have saved on our webshop. I have a list of about 5000 synonyms and want to vlookup them on a list of 1.000.000 queries with the help of Excel. The problem is, that on each "synonym cell" I might have multiple synonyms separated by a blank space. These synonyms I want to find in a list of query strings. Eventually whenever there is a match, I wish to find the product attached to that synonym in the referring cell "VLOOKUP" and count them all together to see how many sales I have gotten with the help of synonyms.

Hopefully this explanation is not too complicated and you can help me out finding each synonym in the search queries. If you have a better idea on how to do this procedure more efficiently, that would be even better. :-)

Here is some sample data that explains well, what I am going for: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing

Feel free to edit it and thanks a lot! Nes

In order to vlookup a synonum (found in column B of worksheet "synonyms", maybye you have to restruct worksheet "synonyms" like this:


| synonym | product |

| -------- | -------|

| icecream | cake1  |

| sweets   | cake1  |

After that you need to split query column in sevral columns (one word per column) and vlookup each of this words in the restructered synonyms worksheet

With that much data, and using Excel, I'd suggest using Power Query (available in Windows Excel 2010+ and O365).

  • Read in the queries table
    • split the query into separate columns by the space delimiter
    • unpivot to create a single column of the query words
  • Read in the synonyms table
  • split the synonyms into separate columns by the space delimiter
  • unpivot to create a single column of the synonyms
  • Do a nested fuzzy join of the two tables
    • fuzzy so you can do things like IgnoreCase and set an appropriate Threshold to allow for single vs plural words, and minor misspellings in the queries.
  • Recombine the split column to create an output table

you may need to tweak things like the column splitter and the threshold depending on your real data. Also I've only allowed for a single synonym and pdroduct. The column splitter can be rewritten to handle any number of columns, and I'll look at that if I have time later today

I tried to comment the M-code to explain things

Note the table names in lines 4 and 18 of the code. You may need to change these (or, if reading them from an external source, change that line entirely).

M Code

paste into Advanced Editor in PQ

let

//Read in Query Table and convert to single column of words in query
    Source = Excel.CurrentWorkbook(){[Name="tblQuery"]}[Content],
    #"Changed Type1" = Table.TransformColumnTypes(Source,{{"query", type text}, {"bought", type text}}),

    //add an index column for eventual reconstruction
    #"Add Index" = Table.AddIndexColumn(#"Changed Type1","Index",0,1),
   
    //may need more splits depending on real data
    splitIt = Table.SplitColumn(#"Add Index", "query", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), 
        {"query.1", "query.2", "query.3"}),
    #"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(splitIt, {"Index", "bought"}, "Attribute", "Value"),
    queryTbl = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute"}),

    //Read in synonym table
    //unpivot to convert to two column table
    Source2 = Excel.CurrentWorkbook(){[Name="tblSyno"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source2,{{"product", type text}, {"query-synonym", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "query-synonym", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"query-synonym.1", "query-synonym.2", "query-synonym.3"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"query-synonym.1", type text}, {"query-synonym.2", type text}, {"query-synonym.3", type text}}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"product"}, "Attribute", "Value"),
    synoTbl = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),

    //combine tables based on synonyms
    //combTbl = Table.NestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter),
    combTbl = Table.FuzzyNestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter,
            [IgnoreCase=true, Threshold=0.9]),

    //extract the synonym
    #"Added Custom" = Table.AddColumn(combTbl, "Synonym", each try Table.Column([Joined],"Value"){0} 
                            otherwise null),
    #"Added Custom4" = Table.AddColumn(#"Added Custom", "Attached Product", each try Table.Column([Joined],"product"){0}
otherwise null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom4",{"Joined"}),

    //Recombine by the Index column to recreate the query
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"Group", each _, type table [bought=nullable text, Index=number, Value=text, Joined=table, Synonym=nullable text]}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Index"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "query", each Table.Column([Group],"Value")),
    #"Extracted Values" = Table.TransformColumns(#"Added Custom1", 
        {"query", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
    
    //extract the "bought" column from the group table
    //if there might be more than one product in 
        //the "bought" column, need to change this
    #"Added Custom2" = Table.AddColumn(#"Extracted Values", "bought", each 
        List.Distinct(Table.Column([Group],"bought")){0}),

    //extract the Matched Synonym column
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Matched Synonym", each List.RemoveNulls(Table.Column([Group],"Synonym")){0}),

    //extract the Attached Product column
    #"Added Custom5" = Table.AddColumn(#"Added Custom3", "Attached Product", each List.RemoveNulls(Table.Column([Group],"Attached Product")){0}),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom5",{"Group"})
in
    #"Removed Columns2"

Synonym Table

在此处输入图像描述

Results

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM