[英]RegEx - Extracting Characters Between Specific Phrase
I need to extract just the measure units from blood test text I've got, ie.我只需要从我得到的血液测试文本中提取测量单位,即。 "K/UL","M/UL","%",etc.
“K/UL”、“M/UL”、“%”等at the following text:
在以下文本中:
WBC 4.27-11.40 k/uL 3.64 (L)
RBC 3.90-5.03 m/uL 4.30
Hemoglobin 10.6-13.4 g/dL 13.0
Hematocrit 32.2-39.8 % 36.1
MCV 74.4-87.6 fL 84.0
MCH 24.8-29.5 pG 30.2 (H)
MCHC 31.8-34.9 g/dL 36.0 (H)
RDW-CV 12.2-14.4 % 13.2
Platelet Count 150-400 k/uL 175
MPV 9.2-11.4 fL 8.6 (L)
Neut% 28.6-74.5 % 43.1
Abs Neut (ANC) 1.63-7.87 k/uL 1.57 (L)
Lymph% 15.5-57.8 % 43.7
Abs Lymph 0.97-4.28 k/uL 1.59
Mono% 4.2-12.3 % 9.3
Abs Mono 0.19-0.85 k/uL 0.34
Eosin% 0.0-4.7 % 3.6
Abs Eosin 0.00-0.52 k/uL 0.13
Baso% 0.0-0.7 % 0.3
Abs Baso 0.00-0.06 k/uL 0.01
Which means I need to recognize '-'+number+' '+ Unit to Extract这意味着我需要识别 '-'+number+' '+ Unit to Extract
I tried to use Negative look behind expression (?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+
, which means to match only non digits if there is a '-' char followed with a float number, but it yields zero matches.我尝试使用 Negative look behind expression
(?<!-[0-9]?!([0-9]*[.])?[0-9]+ )(\\D)+
,这意味着只匹配非数字,如果有一个 '-' 字符后跟一个浮点数,但它产生零匹配。
Please note that I'm using SAS (ie Perl RegEx).请注意,我使用的是 SAS(即 Perl RegEx)。
Use用
\d+-\d[\d.]*\s*\K\S+
Explanation解释
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
[\d.]* any character of: digits (0-9), '.' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\K match reset operator
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.