简体   繁体   English

试图从 Excel 中的单元格的开头和结尾提取一串文本模式

[英]Trying to extract a string of text pattern from the beginning and the end of a cell in Excel

I have the following data and what I would like to see on the column result:我有以下数据以及我希望在列结果中看到的内容:

Data数据 Result结果
PN 65011:2020text text text PN 65011:2020 PN 65011:2020text text text PN 65011:2020 PN 65011:2020, PN 65011:2020 PN 65011:2020, PN 65011:2020
PN 45014-1:2017text text text text PN 65014-1:2017 PN 8726-1:2017/P11:2020 PN 45014-1:2017 文本 文本 文本 文本 PN 65014-1:2017 PN 8726-1:2017/P11:2020 PN 45014-1:2017, PN 65014-1:2017, PN 8726-1:2017/P11:2020 PN 45014-1:2017、PN 65014-1:2017、PN 8726-1:2017/P11:2020
PN 6534:2020text text text text PN 6534:2020文本文本文本文本 PN 6534:2020 PN 6534:2020
PN 65014-1:2017text text text text PN 65014-1:2017/PC1:2013 PN 65014-1:2017 文本 文本 文本 文本 PN 65014-1:2017/PC1:2013 PN 65014-1:2017,PN 65014-1:2017/PC1:2013 PN 65014-1:2017,PN 65014-1:2017/PC1:2013
PN ESO 67345:2019text text text PN 65018-1:2019/PC2:2020 PN ESO 67345:2019text text text PN 65018-1:2019/PC2:2020 PN ESO 67345:2019, PN 65018-1:2019/PC2:2020 PN ESO 67345:2019、PN 65018-1:2019/PC2:2020
PN ESO/EOC 5320:2013text text text PN ESO 27380:2019 PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013text text text PN ESO 27380:2019 PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013, PN ESO 27380:2019, PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013、PN ESO 27380:2019、PN 65015-1:2020/PC:2021

I have used ="PN "&TEXTJOIN(", PN ",1,IF(ISNUMBER(SEARCH("/",TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),LEFT(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),MIN(IFERROR(FIND({" "},LOWER(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),""))-1)))我用过="PN "&TEXTJOIN(", PN ",1,IF(ISNUMBER(SEARCH("/",TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),LEFT(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))),MIN(IFERROR(FIND({" "},LOWER(TRIM(MID(SUBSTITUTE(A2,"PN ",REPT(" ",LEN(A2))),(ROW(INDIRECT("1:"&LEN(A2))))*LEN(A2)-(LEN(A2)-1),LEN(A2))))),""))-1)))

And I almost get what I would like to see, except for the last row (PN ESO 5320:2013), I don't get the numbers.我几乎得到了我想看到的内容,除了最后一行(PN ESO 5320:2013),我没有得到数字。 It stops at PN ESO.它停在 PN ESO。 Like this:像这样:

Data数据 Result结果
PN ESO/EOC 5320:2013text text PN ESO 27380:2019 text PN 65015-1:2020/PC:2021 PN ESO/EOC 5320:2013 文本 PN ESO 27380:2019 文本 PN 65015-1:2020/PC:2021 PN ESO/EOC, PN ESO PN ESO/EOC,PN ESO

Any ideas on how I can get the entire reference?关于如何获得整个参考的任何想法?

Thank you very much in advance.非常感谢您提前。

Here is an example on how you could approach this using Excel O365这是一个关于如何使用 Excel O365 来解决此问题的示例

在此处输入图像描述

Formula in B2 : B2中的公式:

=TEXTJOIN(", ",,LET(X,FILTERXML("<t><s>"&SUBSTITUTE(A2,"PN ","</s><s>PN ")&"</s></t>","//s[position() > 1]"),Y,LEFT(X,FIND("|",SUBSTITUTE(X,":","|",LEN(X)-LEN(SUBSTITUTE(X,":",""))))+4),Y))

The idea here is to first SUBSTITUTE() all instances of "PN " to a valid xpath construct.这里的想法是首先将“PN”的所有实例SUBSTITUTE()替换为有效的 xpath 构造。 Then we using FILTERXML() to return all values as an array, obviously still with the concatenated "text text text".然后我们使用FILTERXML()将所有值作为数组返回,显然仍然是连接的“文本文本文本”。 Therefor I used LET() to load the array as a variable and use some string manipulation on all elements.因此,我使用LET()将数组作为变量加载,并对所有元素使用一些字符串操作。

First I substituted the last occurence of the colon in all strings into a pipe-symbol which we then FIND() and return its position.首先,我将所有字符串中最后出现的冒号替换为管道符号,然后我们FIND()并返回它的 position。 Now we have the positions we can extract the the proper substrings using LEFT() .现在我们有了可以使用LEFT()提取正确子字符串的位置。 Used TEXTJOIN() to join the resulting array back together.使用TEXTJOIN()将结果数组重新连接在一起。

If you can accept a VBA solution, regular expressions are well suited for this kind of problem.如果您可以接受 VBA 解决方案,则正则表达式非常适合此类问题。 If your examples are all as you show:如果您的示例都如您所见:

We use the regex which will look for substrings that我们使用正则表达式来查找子字符串

  • start with PN以 PN 开头
  • pick up the following characters until we end with a colon followed by multiple digits.拿起以下字符,直到我们以冒号结尾,后跟多个数字。
  • if there is a / following, then look for the next set up to colon-multiple digit pattern.如果后面有/ ,则寻找下一个设置为冒号-多位数字的模式。

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic Editor.要输入此用户定义的 Function (UDF), <alt-F11>打开 Visual Basic 编辑器。 Ensure your project is highlighted in the Project Explorer window.确保您的项目在 Project Explorer window 中突出显示。 Then, from the top menu, select Insert/Module and paste the code below into the window that opens.然后,从顶部菜单中,select插入/模块并将下面的代码粘贴到打开的 window 中。

To use this User Defined Function (UDF), enter a formula like =extrPN(cell_Ref) in some cell.要使用此用户定义的 Function (UDF),请在某个单元格中输入类似=extrPN(cell_Ref)的公式。

Option Explicit
Function extrPN(S As String) As String
    Dim RE As Object, MC As Object, M As Object
    Const sPat As String = "PN[^:]+:\d+(?:/[^:]+:\d+)?"
    Dim sTemp As String
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .Pattern = sPat
    .ignorecase = False
    If .Test(S) = True Then
        Set MC = .Execute(S)
            For Each M In MC
                sTemp = sTemp & ", " & M
            Next M
            extrPN = Mid(sTemp, 3)
    Else: extrPN = "no match"
    End If
End With
End Function

在此处输入图像描述

Explanation of Regex正则表达式的解释

extract PN提取PN

PN.*?:\d+(?:/[^:]+:\d+)?

Options: Case insensitive;选项:不区分大小写; ^$ match at line breaks ^$ 匹配换行符

Created with RegexBuddy使用RegexBuddy创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM