简体   繁体   English

如何使用正则表达式匹配不连续的字符串

[英]How to use Regex to match discontinuous strings

This question is probably simple for others, but I'm new to RegEx and to this forum, and haven't been able to find an answer anywhere.这个问题对其他人来说可能很简单,但我是RegEx和这个论坛的新手,并且无法在任何地方找到答案。

I have emails coming into Microsoft Outlook that generally look like this:我有进入 Microsoft Outlook 的电子邮件通常如下所示:


Patient: SMITH, JANE患者:史密斯,简

MRN: 12345678 MRN:12345678

EncounterID: 1234567890遭遇号:1234567890

EncounterDate: Apr 11 2017 12:00AM邂逅日期:2017 年 4 月 11 日 12:00AM

Department: NEUROLOGY科室:神经内科

Center: Headache中心:头痛

Location: Main Campus地点:主校区

Visit Type: NEW NEUR HEADACHE就诊类型:新神经头痛

Attending Phys: JONES, MARY主治医师:JONES, MARY


I want to have Outlook examine each email as it arrives, select those whose subject line indicates that they have relevant information, then extract the MRN, Last Name of patient, First Name of patient, and Encounter Date.我想让 Outlook 在收到每封电子邮件时对其进行检查,选择主题行表明它们具有相关信息的邮件,然后提取 MRN、患者的姓氏、患者的名字和相遇日期。

When a new email arrives, my module runs the following Sub:当收到新电子邮件时,我的模块会运行以下 Sub:

Public Sub ProcessImatchKpEmails(item As Outlook.MailItem)
Dim LastName As String
Dim FirstName As String
Dim EncounterDate As String
Dim MRN As String
Dim Body As String

On Error Resume Next

'   Check to make sure it is an Outlook mail message.
    If TypeName(item) <> "MailItem" Then Exit Sub
    Body = item.Body

'   Exract data from the email
    If item.Subject =  _
        gImatchKpEmailSubjectNo Or item.Subject = _ 
        gImatchKpEmailSubjectYes Or _
        item.Subject=_gImatchKpEmailSubjectMaybe Then
           MRN = ExtractText(Body, RegPattern("MRN"))
           LastName = ExtractText(Body, RegPattern("LastName"))
           FirstName = ExtractText(Body, RegPattern("FirstName"))
           EncounterDate = ExtractText(Body, RegPattern("EncounterDate"))
    End If
End Sub

The RegPattern function looks like this: RegPattern函数如下所示:

Public Function RegPattern(Lookup As String) As String 'Creates a 
  regPattern for each type of lookup

On Error Resume Next

    Select Case Lookup
        Case "LastName"
            RegPattern = "Patient\s*[:]+\s*(\w*)\s*"
        Case "FirstName"
            RegPattern = "Patient\s*[:]+\s*(\w*)[,](\w*)\s*"
        Case "EncounterDate"
            RegPattern = "EncounterDate\s*[:]+\s*(\w*)\s*" 
        Case "MRN"
            RegPattern = "MRN\s*[:]+\s*(\d*)\s*"
    End Select

    Debug.Print Lookup, RegPattern

End Function

The ExtractText Function looks like this: ExtractText函数如下所示:

Public Function ExtractText(Str As String, RegPattern As String) As 
   String
Dim regEx As New RegExp
Dim numMatches As MatchCollection
Dim M As Match

On Error Resume Next

regEx.Pattern = RegPattern

Set numMatches = regEx.Execute(Str)
If numMatches.Count = 0 Then
    ExtractText = "missing"
Else
    Set M = numMatches(0)
    ExtractText = M.SubMatches(0)
End If

Debug.Print ExtractText
End Function

When I run this, the code picks up the new email, and it manages to pull the MRN (12345678) and Last Name of Patient (Smith) accurately.当我运行它时,代码会收到新的电子邮件,并设法准确地提取出 MRN (12345678) 和患者姓氏 (Smith)。

However, it also pulls the First Name of Patient as (Smith).但是,它也将患者的名字拉为 (Smith)。 Similarly, it pulls the Encounter Date as (Apr), but loses the rest.同样,它将遇到日期拉为 (Apr),但丢失了其余部分。

Can anybody tell me what the appropriate RegEx code would be to get the patient's first name, as well as the entire Encounter Date?任何人都可以告诉我获取患者的名字以及整个遭遇日期的适当RegEx代码是什么?

Thanks for your help.感谢您的帮助。

"Patient\\s*[:]+\\s*(\\w*)[,](\\w*)\\s*"

The core problem is that you always extract the 0th submatch;核心问题是你总是提取第0个子匹配; but you have two sets of capturing parentheses.但是您有两组捕获括号。 Changing the first set of parentheses into non-capturing ones should help:将第一组括号更改为非捕获括号应该会有所帮助:

"Patient\s*[:]+\s*(?:\w*)[,](\w*)\s*"

Or even having no parentheses for the first name, as there's no reason why you should need grouping there.甚至名字没有括号,因为没有理由需要在那里分组。

Also note that [:] is identical to : , and you probably want to capture at least one character for names, which is \\w+ instead of \\w* .另请注意, [:]:相同,并且您可能希望为名称捕获至少一个字符,即\\w+而不是\\w*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM