RegEx - 在特定短语之间提取字符

Question

I need to extract just the measure units from blood test text I've got, ie.我只需要从我得到的血液测试文本中提取测量单位，即。 "K/UL","M/UL","%",etc. “K/UL”、“M/UL”、“%”等at the following text:在以下文本中：

WBC                            4.27-11.40 k/uL                        3.64 (L)
RBC                            3.90-5.03 m/uL                         4.30
Hemoglobin                     10.6-13.4 g/dL                         13.0
Hematocrit                     32.2-39.8 %                            36.1
MCV                            74.4-87.6 fL                           84.0
MCH                            24.8-29.5 pG                           30.2 (H)
MCHC                           31.8-34.9 g/dL                         36.0 (H)
RDW-CV                         12.2-14.4 %                            13.2
Platelet Count                 150-400 k/uL                           175
MPV                            9.2-11.4 fL                            8.6 (L)
Neut%                          28.6-74.5 %                            43.1
Abs Neut (ANC)                 1.63-7.87 k/uL                         1.57 (L)
Lymph%                         15.5-57.8 %                            43.7
Abs Lymph                      0.97-4.28 k/uL                         1.59
Mono%                          4.2-12.3 %                             9.3
Abs Mono                       0.19-0.85 k/uL                         0.34
Eosin%                         0.0-4.7 %                              3.6
Abs Eosin                      0.00-0.52 k/uL                         0.13
Baso%                          0.0-0.7 %                              0.3
Abs Baso                       0.00-0.06 k/uL                         0.01

Which means I need to recognize '-'+number+' '+ Unit to Extract这意味着我需要识别 '-'+number+' '+ Unit to Extract

I tried to use Negative look behind expression (?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+ , which means to match only non digits if there is a '-' char followed with a float number, but it yields zero matches.我尝试使用 Negative look behind expression (?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+ ，这意味着只匹配非数字，如果有一个 '-' 字符后跟一个浮点数，但它产生零匹配。

Please note that I'm using SAS (ie Perl RegEx).请注意，我使用的是 SAS（即 Perl RegEx）。

Answer 1

Use用

\d+-\d[\d.]*\s*\K\S+

See proof查看证明

Explanation解释

--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  [\d.]*                   any character of: digits (0-9), '.' (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \K                       match reset operator
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))

RegEx - 在特定短语之间提取字符

问题描述

1 个解决方案

解决方案1
2 2020-10-06 21:32:43

RegEx - 在特定短语之间提取字符

问题描述

1 个解决方案

解决方案1 2 2020-10-06 21:32:43

解决方案1
2 2020-10-06 21:32:43