简体   繁体   English

RegEx - 在特定短语之间提取字符

[英]RegEx - Extracting Characters Between Specific Phrase

I need to extract just the measure units from blood test text I've got, ie.我只需要从我得到的血液测试文本中提取测量单位,即。 "K/UL","M/UL","%",etc. “K/UL”、“M/UL”、“%”等at the following text:在以下文本中:

WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)
MCHC                           31.8-34.9 g/dL                         36.0 (H)
RDW-CV                         12.2-14.4 %                            13.2
Platelet Count                 150-400 k/uL                           175
MPV                            9.2-11.4 fL                            8.6 (L)
Neut%                          28.6-74.5 %                            43.1
Abs Neut (ANC)                 1.63-7.87 k/uL                         1.57 (L)
Lymph%                         15.5-57.8 %                            43.7
Abs Lymph                      0.97-4.28 k/uL                         1.59
Mono%                          4.2-12.3 %                             9.3
Abs Mono                       0.19-0.85 k/uL                         0.34
Eosin%                         0.0-4.7 %                              3.6
Abs Eosin                      0.00-0.52 k/uL                         0.13
Baso%                          0.0-0.7 %                              0.3
Abs Baso                       0.00-0.06 k/uL                         0.01

Which means I need to recognize '-'+number+' '+ Unit to Extract这意味着我需要识别 '-'+number+' '+ Unit to Extract

I tried to use Negative look behind expression (?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+ , which means to match only non digits if there is a '-' char followed with a float number, but it yields zero matches.我尝试使用 Negative look behind expression (?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+ ,这意味着只匹配非数字,如果有一个 '-' 字符后跟一个浮点数,但它产生零匹配。

Please note that I'm using SAS (ie Perl RegEx).请注意,我使用的是 SAS(即 Perl RegEx)。

Use

\d+-\d[\d.]*\s*\K\S+

See proof查看证明

Explanation解释

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  [\d.]*                   any character of: digits (0-9), '.' (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \K                       match reset operator
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM