[英]Trying to extract a string of text pattern from the beginning and the end of a cell in Excel
I have the following data and what I would like to see on the column result:我有以下数据以及我希望在列结果中看到的内容:
Data数据 | Result结果 |
---|---|
PN 65011:2020text text text PN 65011:2020 PN 65011:2020text text text PN 65011:2020 | PN 65011:2020, PN 65011:2020 PN 65011:2020, PN 65011:2020 |
PN 45014-1:2017text text text text PN 65014-1:2017 PN 8726-1:2017/P11:2020 PN 45014-1:2017 文本 文本 文本 文本 PN 65014-1:2017 PN 8726-1:2017/P11:2020 | PN 45014-1:2017, PN 65014-1:2017, PN 8726-1:2017/P11:2020 PN 45014-1:2017、PN 65014-1:2017、PN 8726-1:2017/P11:2020 |
PN 6534:2020text text text text PN 6534:2020文本文本文本文本 | PN 6534:2020 PN 6534:2020 |
PN 65014-1:2017text text text text PN 65014-1:2017/PC1:2013 PN 65014-1:2017 文本 文本 文本 文本 PN 65014-1:2017/PC1:2013 | PN 65014-1:2017,PN 65014-1:2017/PC1:2013 PN 65014-1:2017,PN 65014-1:2017/PC1:2013 |
PN ESO 67345:2019text text text PN 65018-1:2019/PC2:2020 PN ESO 67345:2019text text text PN 65018-1:2019/PC2:2020 | PN ESO 67345:2019, PN 65018-1:2019/PC2:2020 PN ESO 67345:2019、PN 65018-1:2019/PC2:2020 |
PN ESO/EOC 5320:2013text text text PN ESO 27380:2019 PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013text text text PN ESO 27380:2019 PN 65015-1:2020/PC:2021 | PN ESO/EOC 5320:2013, PN ESO 27380:2019, PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013、PN ESO 27380:2019、PN 65015-1:2020/PC:2021 |
I have used ="PN "&TEXTJOIN(", PN ",1,IF(ISNUMBER(SEARCH("/",TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),LEFT(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),MIN(IFERROR(FIND({" "},LOWER(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),""))-1)))
我用过="PN "&TEXTJOIN(", PN ",1,IF(ISNUMBER(SEARCH("/",TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),LEFT(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),MIN(IFERROR(FIND({" "},LOWER(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),""))-1)))
And I almost get what I would like to see, except for the last row (PN ESO 5320:2013), I don't get the numbers.我几乎得到了我想看到的内容,除了最后一行(PN ESO 5320:2013),我没有得到数字。 It stops at PN ESO.它停在 PN ESO。 Like this:像这样:
Data数据 | Result结果 |
---|---|
PN ESO/EOC 5320:2013text text PN ESO 27380:2019 text PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013 文本 PN ESO 27380:2019 文本 PN 65015-1:2020/PC:2021 | PN ESO/EOC, PN ESO PN ESO/EOC,PN ESO |
Any ideas on how I can get the entire reference?关于如何获得整个参考的任何想法?
Thank you very much in advance.非常感谢您提前。
Here is an example on how you could approach this using Excel O365这是一个关于如何使用 Excel O365 来解决此问题的示例
Formula in B2
: B2
中的公式:
=TEXTJOIN(", ",,LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A2,"PN ","</s><s>PN ")&"</s></t>","//s[position() > 1]"),Y,LEFT(X,FIND("|",SUBSTITUTE(X,":","|",LEN(X)-LEN(SUBSTITUTE(X,":",""))))+4),Y))
The idea here is to first SUBSTITUTE()
all instances of "PN " to a valid xpath construct.这里的想法是首先将“PN”的所有实例SUBSTITUTE()
替换为有效的 xpath 构造。 Then we using FILTERXML()
to return all values as an array, obviously still with the concatenated "text text text".然后我们使用FILTERXML()
将所有值作为数组返回,显然仍然是连接的“文本文本文本”。 Therefor I used LET()
to load the array as a variable and use some string manipulation on all elements.因此,我使用LET()
将数组作为变量加载,并对所有元素使用一些字符串操作。
First I substituted the last occurence of the colon in all strings into a pipe-symbol which we then FIND()
and return its position.首先,我将所有字符串中最后出现的冒号替换为管道符号,然后我们FIND()
并返回它的 position。 Now we have the positions we can extract the the proper substrings using LEFT()
.现在我们有了可以使用LEFT()
提取正确子字符串的位置。 Used TEXTJOIN()
to join the resulting array back together.使用TEXTJOIN()
将结果数组重新连接在一起。
If you can accept a VBA solution, regular expressions are well suited for this kind of problem.如果您可以接受 VBA 解决方案,则正则表达式非常适合此类问题。 If your examples are all as you show:如果您的示例都如您所见:
We use the regex which will look for substrings that我们使用正则表达式来查找子字符串
/
following, then look for the next set up to colon-multiple digit pattern.如果后面有/
,则寻找下一个设置为冒号-多位数字的模式。 To enter this User Defined Function (UDF), <alt-F11>
opens the Visual Basic Editor.要输入此用户定义的 Function (UDF), <alt-F11>
打开 Visual Basic 编辑器。 Ensure your project is highlighted in the Project Explorer window.确保您的项目在 Project Explorer window 中突出显示。 Then, from the top menu, select Insert/Module and paste the code below into the window that opens.然后,从顶部菜单中,select插入/模块并将下面的代码粘贴到打开的 window 中。
To use this User Defined Function (UDF), enter a formula like =extrPN(cell_Ref)
in some cell.要使用此用户定义的 Function (UDF),请在某个单元格中输入类似=extrPN(cell_Ref)
的公式。
Option Explicit
Function extrPN(S As String) As String
Dim RE As Object, MC As Object, M As Object
Const sPat As String = "PN[^:]+:\d+(?:/[^:]+:\d+)?"
Dim sTemp As String
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = sPat
.ignorecase = False
If .Test(S) = True Then
Set MC = .Execute(S)
For Each M In MC
sTemp = sTemp & ", " & M
Next M
extrPN = Mid(sTemp, 3)
Else: extrPN = "no match"
End If
End With
End Function
Explanation of Regex正则表达式的解释
extract PN提取PN
PN.*?:\d+(?:/[^:]+:\d+)?
Options: Case insensitive;选项:不区分大小写; ^$ match at line breaks ^$ 匹配换行符
PN
匹配字符串“PN”字面的PN
.*?
匹配任何不是换行符的单个字符.*?
:
匹配冒号字符:
\d+
匹配作为“数字”的单个字符\d+
(?:/[^:]+:\d+)?
匹配下面的正则表达式(?:/[^:]+:\d+)?
?
在 0 到 1 次之间,尽可能多次,按需回馈(贪婪) ?
/
从字面上匹配字符“/” /
[^:]+
匹配任何不是冒号字符[^:]+
的字符
:
匹配冒号字符:
\d+
匹配作为“数字”的单个字符\d+
Created with RegexBuddy使用RegexBuddy创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.