简体   繁体   中英

excel formula find part number in file path text string

I have a extract of all the files on a network drive, and in the some file names is a part number, the part numbers format is 0000-000000-00 . Now in the 600,000+ path names in this file I'm trying to figure out how to extract my part numbers out of the path names. I think a mid formula might work but I am at a loss on how to tell it to find anything with the part # format 0000-000000-00 and extract only those 14 characters from the path?

input looks like this

c:\users\stuff\folder_name\1234-000001-01_ baskets_1.pdf
c:\users\stuff\folder_name\1234-000001-02_ baskets_2.pdf
c:\users\stuff\folder_name\1234-000001-03_ baskets_3.pdf
c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf
c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf
c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf

output I'm hoping for

1234-000001-01
1234-000001-02
1234-000001-03
1234-000030-01

Since you have a pattern we can exploit, use this:

=MID(A1,SEARCH("????-??????-??",A1),14)

Finds the start of the pattern and returns the 14 character after.

在此处输入图片说明

You wanted a formula but a UDF could also be used to apply a regex to get the pattern (a little overkill in this instance but worth being aware of):

Option Explicit
Public Sub GetCustomString()
    Dim i As Long, tests()
    tests = Array("c:\users\stuff\folder_name\1234-000001-01_ baskets_1.pdf", _
    "c:\users\stuff\folder_name\1234-000001-02_ baskets_2.pdf", _
    "c:\users\stuff\folder_name\1234-000001-03_ baskets_3.pdf", _
    "c:\users\stuff\folder_name\1234-000030-01_ tree_30.pdf", _
    "c:\users\stuff\folder_name\random text_1234-000030-02_ tree_30.pdf", _
    "c:\users\stuff\folder_name\more random stuff_1234-000030-02_ tree_30.pdf")

    For i = LBound(tests) To UBound(tests)
        Debug.Print GetString(tests(i))
    Next
End Sub

Public Function GetString(ByVal inputString As String) As String
    Dim arr() As String, i As Long, matches As Object, re As Object
    Set re = CreateObject("VBScript.RegExp")
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\d{4}-\d{6}-\d{2}"
        If .test(inputString) Then
            GetString = .Execute(inputString)(0)
        Else
            GetString = vbNullString
        End If
    End With
End Function

Using UDF in sheet:

图案


Pattern: \\d{4}-\\d{6}-\\d{2}

Explanation:

\\d{4} matches a digit (equal to [0-9])

{4} Quantifier — Matches exactly 4 times

"-" matches the character - literally (case sensitive)

\\d{6} matches a digit (equal to [0-9])

{6} Quantifier — Matches exactly 6 times

"-" matches the character - literally (case sensitive)

\\d{2} matches a digit (equal to [0-9])

{2} Quantifier — Matches exactly 2 times

Global pattern flags: g modifier: global. All matches (don't return after first match) m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM