简体   繁体   English

Python 正则表达式忽略日期模式

[英]Python Regex To Ignore Date Pattern

Sample Data:样本数据:

Weight Measured: 80.7 kg (11/27/1900 24:59:00)
Pulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)
Weight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)
Resp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|
11.2 oz)
And one extra weight example 100lbs

Partially working Regex:部分工作的正则表达式:

\b(?i)(?:weight|wt)\b(?:.){1,25}?\b(\d+\.?(?:\d+)).*?(\w+)\b

Current output:当前输出:

('80.7', 'kg'), ('101.2', 'kg'), ('11', '11'), ('84.7', 'kg'), ('100', 'lbs')

Expected ouput:预期输出:

('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')

How do I make my current regex ignore dates and capture the value that follows?如何让我当前的正则表达式忽略日期并捕获后面的值? Also, how do I make this regex to stop matching at the end of line?另外,如何让这个正则表达式在行尾停止匹配?

You may use你可以使用

re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text)

See the regex demo请参阅正则表达式演示

Details细节

  • (?i) - same as re.I - case insensitive mode on (?i) - 与re.I相同 - 打开不区分大小写模式
  • \b - a word boundary \b - 单词边界
  • w(?:eigh)?t - wt or weight w(?:eigh)?t - wtweight
  • \b - a word boundary \b - 单词边界
  • .{1,25}? - any 1 to 25 chars other than line break chars, as few as possible - 除换行字符外的任何 1 到 25 个字符,尽可能少
  • \b - a word boundary \b - 单词边界
  • (?<!\d/) - a negative lookbehind that fails the match if immediately to the left of the current location there is a digit and / (?<!\d/) - 如果在当前位置的左侧立即有一个数字和/
  • (\d+(?:\.\d+)?) - Group 1: one or more digits followed with an optional sequence of a dot and one or more digits (\d+(?:\.\d+)?) - 第 1 组:一个或多个数字后跟可选的点序列和一个或多个数字
  • (??/?\d) - a negative lookahead that fails the match if immediately to the right of the current location there is an optional / and a digit (??/?\d) - 如果在当前位置的右侧有一个可选的/和一个数字,则匹配失败的否定前瞻
  • \s* - 0+ whitespaces \s* - 0+ 空格
  • (\w+) - Group 2: one or more letters, digits or underscores. (\w+) - 第 2 组:一个或多个字母、数字或下划线。

See Python demo :请参阅Python 演示

import re
text = """Weight Measured: 80.7 kg (11/27/1900 24:59:00)\nPulse 64 \F\ Temp 37.3?C (99.1 ?F) \F\ Wt 101.2 kg (223 lb)\nWeight as of 11/11/1900 72.2 kg (159 lb 1.6 oz)\nResp. rate 16, height 177.8 cm (5' 10"), weight 84.7 kg (186 lb|\n11.2 oz)\nAnd one extra weight example 100lbs"""
print(re.findall(r'(?i)\bw(?:eigh)?t\b.{1,25}?\b(?<!\d/)(\d+(?:\.\d+)?)(?!/?\d)\s*(\w+)', text))
# => [('80.7', 'kg'), ('101.2', 'kg'), ('72.2', 'kg'), ('84.7', 'kg'), ('100', 'lbs')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM