简体   繁体   English

在 Arrayformula 中提取行

[英]Extracting rows in Arrayformula

I think this is a simple syntax problem.我认为这是一个简单的语法问题。 I am extracting keywords from a string.我正在从字符串中提取关键字。 The keyword set is in a column and the source strings are in a separate column.关键字集在一列中,源字符串在单独的列中。

I want the keyword result of each string in the adjacent column using a single arrayformula type construct at the head of the result column.我希望相邻列中每个字符串的关键字结果在结果列的头部使用单个 arrayformula 类型构造。 I'm open to QUERY, FILTER or any other type of dynamic array formula.我对 QUERY、FILTER 或任何其他类型的动态数组公式持开放态度。

The real world spreadsheet has a test string column of indeterminate size ranging from zero entries to around 4000 depending on the build query.真实世界的电子表格有一个不确定大小的测试字符串列,范围从零个条目到大约 4000 个,具体取决于构建查询。 The keyword column is also dynamic and changes when the system needs to append or delete keywords.关键字列也是动态的,当系统需要append或删除关键字时会发生变化。 It's currently only around 60 rows.目前只有大约 60 行。 There is a limit of four results per string with no particular priority of the matching keywords and no constraint on the order in which they appear.每个字符串有四个结果的限制,匹配关键字没有特定的优先级,并且对它们出现的顺序没有限制。

A keyword can be any number of words.关键字可以是任意数量的单词。 So 'Tree' and 'Tall Tree' would be two rows.所以 'Tree' 和 'Tall Tree' 将是两行。 The longer keyword string always takes priority.较长的关键字字符串始终优先。 So for example the keyword result of the string 'I have a tall tree in my garden' would be 'Tall Tree' and not 'Tree, Tall Tree'.例如,字符串“I have a tall tree in my garden”的关键字结果将是“Tall Tree”而不是“Tree, Tall Tree”。

A short example: My keyword set (column A - and given the Named Range here of 'myWords'):一个简短的示例:我的关键字集(A 列 - 并在此处给出“myWords”的命名范围):

    ate
    blue
    cat
    the cat
    for
    dead
    bob
    alive

My strings to test (column B):我要测试的字符串(B 列):

Bob ate the dead cat
The cat ate live bob
No cat ate live dog
Bob is dead
Bob and the cat are alive

My expected results (column C):我的预期结果(C 列):

Bob, Ate, Dead, Cat
The Cat, Ate, Bob
Cat, Ate
Bob, Dead
Bob, The Cat, Alive

The example spreadsheet is here .示例电子表格在这里

If I copy down the following formula I get my expected results.如果我复制以下公式,我会得到预期的结果。

=PROPER(TEXTJOIN(", ",TRUE,ArrayFormula(IFERROR(REGEXREPLACE($A2,REGEXREPLACE($A2,"(?i)("&TEXTJOIN("|",TRUE,myWords)&")","(.*)"),{"$1","$2","$3","$4"})))))

Good result, unwanted method:好的结果,不需要的方法:

Bob, Ate, Dead, Cat
The Cat, Ate, Bob
Cat, Ate
Bob, Dead
Bob, The Cat, Alive

If I construct an arrayformula version then I get the right result but all on the first row.如果我构造一个 arrayformula 版本,那么我会得到正确的结果,但都在第一行。

=arrayformula(PROPER(TEXTJOIN(", ",TRUE,IFERROR(REGEXREPLACE($A$2:$A$20,REGEXREPLACE($A$2:$A$20,"(?i)("&TEXTJOIN("|",TRUE,myWords)&")","(.*)"),{"$1","$2","$3","$4"})))))

Unwanted result, preferred type of method:不需要的结果,首选方法类型:

Bob, Ate, Dead, Cat, The Cat, Ate, Bob, Cat, Ate, Bob, Dead, Bob, The Cat, Alive

I feel the answer is going to be embarrassingly simple - but I just can't get it!我觉得答案会简单得令人尴尬——但我就是做不到!

UPDATE 31/12/2020 The answer below by player0 is the perfect solution. 2020 年 12 月 31 日更新以下player0的回答是完美的解决方案。 The example spreadsheet has been updated with this answer.示例电子表格已使用此答案更新。

use:采用:

=ARRAYFORMULA(REGEXREPLACE(TRIM(FLATTEN(QUERY(TRANSPOSE(
 PROPER(IFERROR(REGEXREPLACE(B2:B14, REGEXREPLACE(B2:B14, "(?i)("&
 TEXTJOIN("|", 1, myWords)&")", "(.*)"), 
 {"$1,", "$2,", "$3,", "$4,"})))),,9^9))), ",$", ))

在此处输入图像描述

update:更新:

=ARRAYFORMULA(REGEXREPLACE(TRIM(FLATTEN(QUERY(TRANSPOSE(
 PROPER(IFERROR(REGEXREPLACE(REGEXREPLACE(B2:B14, "\+", "♂"), 
 REGEXREPLACE(REGEXREPLACE(B2:B14, "\+", "♂"), "(?i)("&
 TEXTJOIN("|", 1, Answer!myWords)&")", "(.*)"), 
 {"$1,", "$2,", "$3,", "$4,"})))),,9^9))), ",$", ))

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM