简体   繁体   English

使用正则表达式匹配模式

[英]Using Regex to Match Pattern

I am trying to use regex to retrieve Title:Code pair. 我正在尝试使用正则表达式来检索Title:Code对。

(.*?\(CPT-.*?\)|.*?\(ICD-.*?\))

Data: 数据:

SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600) 腓肠肌听觉丧失双侧(MID)(ICD-389.18)右腕神经节囊肿(ICD-727.41)S / P注射DEPO甲氨TO到左肩关节(CPT-20600)

I would like to capture: 我想捕捉:

  • SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) 感觉神经性双侧听觉丧失(ICD-389.18)
  • RIGHT WRIST GANGLION CYST (ICD-727.41) 右手腕节(ICD-727.41)
  • S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600) 将DEPO甲醇S / P注入左肩关节(CPT-20600)

What is the proper regex to use? 正确使用什么正则表达式?

What about a pattern like this: 像这样的模式呢?

.*?\((CPT|ICD)-[A-Z0-9.]+\)

This will match zero or more of any character, non-greedily, followed by a ( followed by either CPT or ICD , followed by a hyphen, followed by one or more Uppercase Latin letters, decimal digits or periods, followed by a ) . 这将以非贪婪方式匹配零个或多个任何字符,后跟一个(后跟CPTICD ,后跟一个连字符,然后是一个或多个大写拉丁字母,十进制数字或句点,然后是)

Note that I picked [A-Z0-9.]+ because, to my understanding, all current ICD-9 codes , ICD-10 codes , and CPT codes conform to that pattern. 请注意,我之所以选择[A-Z0-9.]+是因为据我了解,所有当前的ICD-9代码ICD-10代码CPT代码都符合该模式。

The C# code might look a bit like this: C#代码可能看起来像这样:

var result = Regex.Matches(input, @".*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

If you want to avoid having any surrounding whitespace, you simply trim the result strings ( m => m.Value.Trim() ), or ensure that the matched prefix starts with a non-whitespace character by putting a \\S in front, like this: 如果要避免周围有空格,只需修剪结果字符串( m => m.Value.Trim() ),或通过在前面加上\\S来确保匹配的前缀以非空白字符开头,像这样:

var result = Regex.Matches(input, @"\S.*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

Or using a negative lookahead if you need to handle inputs like (ICD-100)(ICD-200) : 或者如果需要处理(ICD-100)(ICD-200)类的输入,请使用负前瞻:

var result = Regex.Matches(input, @"(?!\s).*?\((CPT|ICD)-[A-Z0-9.]+\)")
                  .Cast<Match>()
                  .Select(m => m.Value);

You can see a working demonstration here . 您可以在此处看到有效的演示

You can use the split() method: 您可以使用split()方法:

string input = "SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)";
string pattern = @"(?<=\))\s*(?=[^\s(])";
string[] result = Regex.Split(input, pattern);

Consider the following Regex... 考虑以下正则表达式...

.*?\d\)

Good Luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM