[英]Complicated VLOOKUP on Excel
I am trying to analyze the efficiency of synonyms I have saved on our webshop.我正在尝试分析我在网上商店中保存的同义词的效率。 I have a list of about 5000 synonyms and want to vlookup them on a list of 1.000.000 queries with the help of Excel.
我有一个大约 5000 个同义词的列表,并想在 Excel 的帮助下在 1.000.000 个查询的列表中查找它们。 The problem is, that on each "synonym cell" I might have multiple synonyms separated by a blank space.
问题是,在每个“同义词单元格”上,我可能有多个由空格分隔的同义词。 These synonyms I want to find in a list of query strings.
我想在查询字符串列表中找到这些同义词。 Eventually whenever there is a match, I wish to find the product attached to that synonym in the referring cell "VLOOKUP" and count them all together to see how many sales I have gotten with the help of synonyms.
最终,每当有匹配项时,我希望在引用单元格“VLOOKUP”中找到附加到该同义词的产品,并将它们一起计算,看看我在同义词的帮助下获得了多少销售额。
Hopefully this explanation is not too complicated and you can help me out finding each synonym in the search queries.希望这个解释不是太复杂,您可以帮助我在搜索查询中找到每个同义词。 If you have a better idea on how to do this procedure more efficiently, that would be even better.
如果您对如何更有效地执行此过程有更好的想法,那就更好了。 :-)
:-)
Here is some sample data that explains well, what I am going for: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing这是一些可以很好解释的示例数据,我要做什么: https://docs.google.com/spreadsheets/d/1UASfryBJ6pQiAqVy8Z6dJ1klJkUzCu4UZZCunjAIDFg/edit?usp=sharing
Feel free to edit it and thanks a lot!随意编辑它,非常感谢! Nes
内斯
In order to vlookup a synonum (found in column B of worksheet "synonyms", maybye you have to restruct worksheet "synonyms" like this:为了查找同义词(在工作表“同义词”的 B 列中找到,也许你必须像这样重构工作表“同义词”:
| synonym | product |
| -------- | -------|
| icecream | cake1 |
| sweets | cake1 |
After that you need to split query column in sevral columns (one word per column) and vlookup each of this words in the restructered synonyms worksheet之后,您需要将查询列拆分为多列(每列一个单词)并在重构的同义词工作表中查找每个单词
With that much data, and using Excel, I'd suggest using Power Query (available in Windows Excel 2010+ and O365).有了这么多数据,并使用 Excel,我建议使用 Power Query(在 Windows Excel 2010+ 和 O365 中可用)。
IgnoreCase
and set an appropriate Threshold
to allow for single vs plural words, and minor misspellings in the queries.IgnoreCase
之类的操作并设置适当的Threshold
以允许单个单词和复数单词以及查询中的轻微拼写错误。you may need to tweak things like the column splitter and the threshold depending on your real data.您可能需要根据实际数据调整列拆分器和阈值等内容。 Also I've only allowed for a single synonym and pdroduct.
此外,我只允许使用一个同义词和产品。 The column splitter can be rewritten to handle any number of columns, and I'll look at that if I have time later today
列拆分器可以重写以处理任意数量的列,如果我今天晚些时候有时间我会看看
I tried to comment the M-code to explain things我试图评论 M 代码来解释事情
Note the table names in lines 4 and 18 of the code.请注意代码第 4 行和第 18 行中的表名。 You may need to change these (or, if reading them from an external source, change that line entirely).
您可能需要更改这些(或者,如果从外部源读取它们,则完全更改该行)。
M Code M代码
paste into Advanced Editor in PQ粘贴到 PQ 中的高级编辑器中
let
//Read in Query Table and convert to single column of words in query
Source = Excel.CurrentWorkbook(){[Name="tblQuery"]}[Content],
#"Changed Type1" = Table.TransformColumnTypes(Source,{{"query", type text}, {"bought", type text}}),
//add an index column for eventual reconstruction
#"Add Index" = Table.AddIndexColumn(#"Changed Type1","Index",0,1),
//may need more splits depending on real data
splitIt = Table.SplitColumn(#"Add Index", "query", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),
{"query.1", "query.2", "query.3"}),
#"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(splitIt, {"Index", "bought"}, "Attribute", "Value"),
queryTbl = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute"}),
//Read in synonym table
//unpivot to convert to two column table
Source2 = Excel.CurrentWorkbook(){[Name="tblSyno"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source2,{{"product", type text}, {"query-synonym", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "query-synonym", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"query-synonym.1", "query-synonym.2", "query-synonym.3"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"query-synonym.1", type text}, {"query-synonym.2", type text}, {"query-synonym.3", type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type2", {"product"}, "Attribute", "Value"),
synoTbl = Table.RemoveColumns(#"Unpivoted Other Columns",{"Attribute"}),
//combine tables based on synonyms
//combTbl = Table.NestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter),
combTbl = Table.FuzzyNestedJoin(queryTbl,"Value",synoTbl,"Value","Joined",JoinKind.LeftOuter,
[IgnoreCase=true, Threshold=0.9]),
//extract the synonym
#"Added Custom" = Table.AddColumn(combTbl, "Synonym", each try Table.Column([Joined],"Value"){0}
otherwise null),
#"Added Custom4" = Table.AddColumn(#"Added Custom", "Attached Product", each try Table.Column([Joined],"product"){0}
otherwise null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom4",{"Joined"}),
//Recombine by the Index column to recreate the query
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Index"}, {{"Group", each _, type table [bought=nullable text, Index=number, Value=text, Joined=table, Synonym=nullable text]}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"Index"}),
#"Added Custom1" = Table.AddColumn(#"Removed Columns1", "query", each Table.Column([Group],"Value")),
#"Extracted Values" = Table.TransformColumns(#"Added Custom1",
{"query", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
//extract the "bought" column from the group table
//if there might be more than one product in
//the "bought" column, need to change this
#"Added Custom2" = Table.AddColumn(#"Extracted Values", "bought", each
List.Distinct(Table.Column([Group],"bought")){0}),
//extract the Matched Synonym column
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "Matched Synonym", each List.RemoveNulls(Table.Column([Group],"Synonym")){0}),
//extract the Attached Product column
#"Added Custom5" = Table.AddColumn(#"Added Custom3", "Attached Product", each List.RemoveNulls(Table.Column([Group],"Attached Product")){0}),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom5",{"Group"})
in
#"Removed Columns2"
Synonym Table同义词表
Results结果
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.