简体   繁体   English

Excel 上的复杂 VLOOKUP

[英]Complicated VLOOKUP on Excel

I am trying to analyze the efficiency of synonyms I have saved on our webshop.我正在尝试分析我在网上商店中保存的同义词的效率。 I have a list of about 5000 synonyms and want to vlookup them on a list of 1.000.000 queries with the help of Excel.我有一个大约 5000 个同义词的列表,并想在 Excel 的帮助下在 1.000.000 个查询的列表中查找它们。 The problem is, that on each "synonym cell" I might have multiple synonyms separated by a blank space.问题是,在每个“同义词单元格”上,我可能有多个由空格分隔的同义词。 These synonyms I want to find in a list of query strings.我想在查询字符串列表中找到这些同义词。 Eventually whenever there is a match, I wish to find the product attached to that synonym in the referring cell "VLOOKUP" and count them all together to see how many sales I have gotten with the help of synonyms.最终,每当有匹配项时,我希望在引用单元格“VLOOKUP”中找到附加到该同义词的产品,并将它们一起计算,看看我在同义词的帮助下获得了多少销售额。

Hopefully this explanation is not too complicated and you can help me out finding each synonym in the search queries.希望这个解释不是太复杂,您可以帮助我在搜索查询中找到每个同义词。 If you have a better idea on how to do this procedure more efficiently, that would be even better.如果您对如何更有效地执行此过程有更好的想法,那就更好了。 :-) :-)

Here is some sample data that explains well, what I am going for: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing这是一些可以很好解释的示例数据,我要做什么: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing

Feel free to edit it and thanks a lot!随意编辑它,非常感谢! Nes内斯

In order to vlookup a synonum (found in column B of worksheet "synonyms", maybye you have to restruct worksheet "synonyms" like this:为了查找同义词(在工作表“同义词”的 B 列中找到,也许你必须像这样重构工作表“同义词”:


| synonym | product |

| -------- | -------|

| icecream | cake1  |

| sweets   | cake1  |

After that you need to split query column in sevral columns (one word per column) and vlookup each of this words in the restructered synonyms worksheet之后,您需要将查询列拆分为多列(每列一个单词)并在重构的同义词工作表中查找每个单词

With that much data, and using Excel, I'd suggest using Power Query (available in Windows Excel 2010+ and O365).有了这么多数据,并使用 Excel,我建议使用 Power Query(在 Windows Excel 2010+ 和 O365 中可用)。

  • Read in the queries table读入查询表
    • split the query into separate columns by the space delimiter通过空格分隔符将查询拆分为单独的列
    • unpivot to create a single column of the query words unpivot 创建查询词的单列
  • Read in the synonyms table读入同义词表
  • split the synonyms into separate columns by the space delimiter通过空格分隔符将同义词分成单独的列
  • unpivot to create a single column of the synonyms unpivot 创建单列同义词
  • Do a nested fuzzy join of the two tables做两个表的嵌套模糊连接
    • fuzzy so you can do things like IgnoreCase and set an appropriate Threshold to allow for single vs plural words, and minor misspellings in the queries.模糊,因此您可以执行IgnoreCase之类的操作并设置适当的Threshold以允许单个单词和复数单词以及查询中的轻微拼写错误。
  • Recombine the split column to create an output table重新组合拆分列,创建 output 表

you may need to tweak things like the column splitter and the threshold depending on your real data.您可能需要根据实际数据调整列拆分器和阈值等内容。 Also I've only allowed for a single synonym and pdroduct.此外,我只允许使用一个同义词和产品。 The column splitter can be rewritten to handle any number of columns, and I'll look at that if I have time later today列拆分器可以重写以处理任意数量的列,如果我今天晚些时候有时间我会看看

I tried to comment the M-code to explain things我试图评论 M 代码来解释事情

Note the table names in lines 4 and 18 of the code.请注意代码第 4 行和第 18 行中的表名。 You may need to change these (or, if reading them from an external source, change that line entirely).您可能需要更改这些(或者,如果从外部源读取它们,则完全更改该行)。

M Code M代码

paste into Advanced Editor in PQ粘贴到 PQ 中的高级编辑器中

let

//Read in Query Table and convert to single column of words in query
    Source = Excel.CurrentWorkbook(){[Name="tblQuery"]}[Content],
    #"Changed Type1" = Table.TransformColumnTypes(Source,{{"query", type text}, {"bought", type text}}),

    //add an index column for eventual reconstruction
    #"Add Index" = Table.AddIndexColumn(#"Changed Type1","Index",0,1),
   
    //may need more splits depending on real data
    splitIt = Table.SplitColumn(#"Add Index", "query", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), 
        {"query.1", "query.2", "query.3"}),
    #"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(splitIt, {"Index", "bought"}, "Attribute", "Value"),
    queryTbl = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute"}),

    //Read in synonym table
    //unpivot to convert to two column table
    Source2 = Excel.CurrentWorkbook(){[Name="tblSyno"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source2,{{"product", type text}, {"query-synonym", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "query-synonym", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"query-synonym.1", "query-synonym.2", "query-synonym.3"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"query-synonym.1", type text}, {"query-synonym.2", type text}, {"query-synonym.3", type text}}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"product"}, "Attribute", "Value"),
    synoTbl = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),

    //combine tables based on synonyms
    //combTbl = Table.NestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter),
    combTbl = Table.FuzzyNestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter,
            [IgnoreCase=true, Threshold=0.9]),

    //extract the synonym
    #"Added Custom" = Table.AddColumn(combTbl, "Synonym", each try Table.Column([Joined],"Value"){0} 
                            otherwise null),
    #"Added Custom4" = Table.AddColumn(#"Added Custom", "Attached Product", each try Table.Column([Joined],"product"){0}
otherwise null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom4",{"Joined"}),

    //Recombine by the Index column to recreate the query
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"Group", each _, type table [bought=nullable text, Index=number, Value=text, Joined=table, Synonym=nullable text]}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Index"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Columns1", "query", each Table.Column([Group],"Value")),
    #"Extracted Values" = Table.TransformColumns(#"Added Custom1", 
        {"query", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
    
    //extract the "bought" column from the group table
    //if there might be more than one product in 
        //the "bought" column, need to change this
    #"Added Custom2" = Table.AddColumn(#"Extracted Values", "bought", each 
        List.Distinct(Table.Column([Group],"bought")){0}),

    //extract the Matched Synonym column
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Matched Synonym", each List.RemoveNulls(Table.Column([Group],"Synonym")){0}),

    //extract the Attached Product column
    #"Added Custom5" = Table.AddColumn(#"Added Custom3", "Attached Product", each List.RemoveNulls(Table.Column([Group],"Attached Product")){0}),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom5",{"Group"})
in
    #"Removed Columns2"

Synonym Table同义词表

在此处输入图像描述

Results结果

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM