[英]How to use Regex to match discontinuous strings
This question is probably simple for others, but I'm new to RegEx
and to this forum, and haven't been able to find an answer anywhere.这个问题对其他人来说可能很简单,但我是
RegEx
和这个论坛的新手,并且无法在任何地方找到答案。
I have emails coming into Microsoft Outlook that generally look like this:我有进入 Microsoft Outlook 的电子邮件通常如下所示:
Patient: SMITH, JANE
患者:史密斯,简
MRN: 12345678
MRN:12345678
EncounterID: 1234567890
遭遇号:1234567890
EncounterDate: Apr 11 2017 12:00AM
邂逅日期:2017 年 4 月 11 日 12:00AM
Department: NEUROLOGY
科室:神经内科
Center: Headache
中心:头痛
Location: Main Campus
地点:主校区
Visit Type: NEW NEUR HEADACHE
就诊类型:新神经头痛
Attending Phys: JONES, MARY
主治医师:JONES, MARY
I want to have Outlook examine each email as it arrives, select those whose subject line indicates that they have relevant information, then extract the MRN, Last Name of patient, First Name of patient, and Encounter Date.我想让 Outlook 在收到每封电子邮件时对其进行检查,选择主题行表明它们具有相关信息的邮件,然后提取 MRN、患者的姓氏、患者的名字和相遇日期。
When a new email arrives, my module runs the following Sub:当收到新电子邮件时,我的模块会运行以下 Sub:
Public Sub ProcessImatchKpEmails(item As Outlook.MailItem)
Dim LastName As String
Dim FirstName As String
Dim EncounterDate As String
Dim MRN As String
Dim Body As String
On Error Resume Next
' Check to make sure it is an Outlook mail message.
If TypeName(item) <> "MailItem" Then Exit Sub
Body = item.Body
' Exract data from the email
If item.Subject = _
gImatchKpEmailSubjectNo Or item.Subject = _
gImatchKpEmailSubjectYes Or _
item.Subject=_gImatchKpEmailSubjectMaybe Then
MRN = ExtractText(Body, RegPattern("MRN"))
LastName = ExtractText(Body, RegPattern("LastName"))
FirstName = ExtractText(Body, RegPattern("FirstName"))
EncounterDate = ExtractText(Body, RegPattern("EncounterDate"))
End If
End Sub
The RegPattern
function looks like this: RegPattern
函数如下所示:
Public Function RegPattern(Lookup As String) As String 'Creates a
regPattern for each type of lookup
On Error Resume Next
Select Case Lookup
Case "LastName"
RegPattern = "Patient\s*[:]+\s*(\w*)\s*"
Case "FirstName"
RegPattern = "Patient\s*[:]+\s*(\w*)[,](\w*)\s*"
Case "EncounterDate"
RegPattern = "EncounterDate\s*[:]+\s*(\w*)\s*"
Case "MRN"
RegPattern = "MRN\s*[:]+\s*(\d*)\s*"
End Select
Debug.Print Lookup, RegPattern
End Function
The ExtractText
Function looks like this: ExtractText
函数如下所示:
Public Function ExtractText(Str As String, RegPattern As String) As
String
Dim regEx As New RegExp
Dim numMatches As MatchCollection
Dim M As Match
On Error Resume Next
regEx.Pattern = RegPattern
Set numMatches = regEx.Execute(Str)
If numMatches.Count = 0 Then
ExtractText = "missing"
Else
Set M = numMatches(0)
ExtractText = M.SubMatches(0)
End If
Debug.Print ExtractText
End Function
When I run this, the code picks up the new email, and it manages to pull the MRN (12345678) and Last Name of Patient (Smith) accurately.当我运行它时,代码会收到新的电子邮件,并设法准确地提取出 MRN (12345678) 和患者姓氏 (Smith)。
However, it also pulls the First Name of Patient as (Smith).但是,它也将患者的名字拉为 (Smith)。 Similarly, it pulls the Encounter Date as (Apr), but loses the rest.
同样,它将遇到日期拉为 (Apr),但丢失了其余部分。
Can anybody tell me what the appropriate RegEx
code would be to get the patient's first name, as well as the entire Encounter Date?任何人都可以告诉我获取患者的名字以及整个遭遇日期的适当
RegEx
代码是什么?
Thanks for your help.感谢您的帮助。
"Patient\\s*[:]+\\s*(\\w*)[,](\\w*)\\s*"
The core problem is that you always extract the 0th submatch;核心问题是你总是提取第0个子匹配; but you have two sets of capturing parentheses.
但是您有两组捕获括号。 Changing the first set of parentheses into non-capturing ones should help:
将第一组括号更改为非捕获括号应该会有所帮助:
"Patient\s*[:]+\s*(?:\w*)[,](\w*)\s*"
Or even having no parentheses for the first name, as there's no reason why you should need grouping there.甚至名字没有括号,因为没有理由需要在那里分组。
Also note that [:]
is identical to :
, and you probably want to capture at least one character for names, which is \\w+
instead of \\w*
.另请注意,
[:]
与:
相同,并且您可能希望为名称捕获至少一个字符,即\\w+
而不是\\w*
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.