简体   繁体   English

VBA 正则表达式:使用自定义函数在 Excel 单元格内的字符串之间提取多个字符串

[英]VBA regex: extract multiple strings between strings within Excel cell with custom function

Within an Excel column I have data such as:在 Excel 列中,我有以下数据:

"Audi (ADI), Mercedes (modelx) (MEX), Ferrari super fast, high PS (FEH)" “奥迪 (ADI)、梅赛德斯 (modelx) (MEX)、法拉利超快、高 PS (FEH)”

There hundreds of models that are described by a name and an abbreviation of three capitalized letters in brackets.有数百个模型由名称和括号中三个大写字母的缩写描述。

I need to extract the names only and the abbreviations only to separate cells.我只需要提取名称和缩写来分隔单元格。 I succeeded doing this for the abbreviations by the following module:我通过以下模块成功地为缩写做到了这一点

Function extrABR(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant
    Const sPat As String = "([A-Z][A-Z][A-Z][A-Z]?)"  ' this is my regex to match my string
    
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    If .Test(cellRef) Then
        Set MC = .Execute(cellRef)
        For Each M In MC
            sTemp = sTemp & ", " & M.SubMatches(0)
        Next M
    End If
End With

extrABR = Mid(sTemp, 3)

End Function 

However, I do not manage to do so for names .但是,我无法为 names 这样做 I thought of just exchanging the regex by the following regex: (^(.*?)(?= \\([AZ][AZ][AZ])|(?<=, )(.*)(?= \\([AZ][AZ][AZ])) , but VBA does not seem to allow lookbehind.我想通过以下正则表达式交换正则表达式: (^(.*?)(?= \\([AZ][AZ][AZ])|(?<=, )(.*)(?= \\([AZ][AZ][AZ])) ,但 VBA 似乎不允许后视。

Any idea?任何想法?

Correct, lookbehinds are not supported, but they are only necessary when your expected matches overlap.正确,不支持后视,但只有在您预期的匹配重叠时才需要它们。 It is not the case here, all your matches are non-overlapping.情况并非如此,您的所有匹配项都不重叠。 So, you can again rely on capturing :因此,您可以再次依赖capture

(?:^|,)\s*(.*?)(?=\s*\([A-Z]{3,}\))

See the regex demo .请参阅正则表达式演示 Group 1 values are accessed via .Submatches(0) .第 1 组值通过.Submatches(0)访问。

Details :详情

  • (?:^|,) - either start of a string or a comma (?:^|,) - 字符串开头或逗号
  • \\s* - zero or more whitespace chars \\s* - 零个或多个空白字符
  • (.*?) - Capturing group 1: any zero or more chars other than line break chars as few as possible (.*?) - 捕获组 1:尽可能少的除换行符以外的任何零个或多个字符
  • (?=\\s*\\([AZ]{3,}\\)) - a positive lookahead that matches a location that is immediately followed with (?=\\s*\\([AZ]{3,}\\)) - 一个正向前瞻,匹配紧跟其后的位置
    • \\s* - zero or more whitespace chars \\s* - 零个或多个空白字符
    • \\( - a ( char \\( - a (字符
    • [AZ]{3,} - three or more uppercase chars [AZ]{3,} - 三个或更多大写字符
    • \\) - a ) char. \\) - a )字符。

Demo screenshot:演示截图:

在此处输入图片说明

RE.REPLACE -- Try this function.. anything between the parenthesis will be replaced with "" giving you string of model names only, which you can then split on comma and get string array if so desired. RE.REPLACE——试试这个函数。括号之间的任何东西都将被替换为“”,只给你一串模型名称,如果需要,你可以用逗号分割并获取字符串数组。

Function ModelNames(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant, sPat As String
    sPat = "\([^)]+\)"
'Or you can use your formula pattern "([A-Z][A-Z][A-Z][A-Z]?)" to get (modelx)  in the final output.

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
End With

ModelNames = RE.Replace(cellRef, "")

End Function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM