简体   繁体   English

Python3 正则表达式 findall

[英]Python3 regex findall

Here is my issue.这是我的问题。 Given below list:给出以下列表:

a = ['COP' , '\t\t\t', 'Basis', 'Notl', 'dv01', '6m', '9m', '1y',
     '18m', '2y', '3y', "15.6", 'mm', '4.6', '4y', '5y', '10', 'mm',
     '4.6', '6y', '7y', '8y', '9y', '10y', '20y', 'TOTAL', '\t\t9.2' ]

I'm trying to get some outputs like this one.我正在尝试获得一些像这样的输出。 The most important note is the rows After the first number ended on "y" or "m" will come a number only if it is there in the list Example : ('3y', '15.6', '')最重要的注意事项是以“y”或“m”结尾的第一个数字之后的行,只有当它在列表中时才会出现一个数字示例:('3y', '15.6', '')

SAMPLE OUTPUT ( forget about the structure that is a tuple, jsut want teh values) SAMPLE OUTPUT(忘记元组的结构,只需要值)

('6m', '', '')
('9m', '', '')
('1y', '', '')
('18m', '', '')
('2y', '', '')
('3y', '15.6', '')
('4y', '', '')
('5y', '10', '')
('6y', '', '')
('7y', '', '')
('8y', '', '')
('9y', '', '')
('10y', '', '')
('20y', '', '')

I used the following regex that should have returned :我使用了以下应该返回的正则表达式:

  1. all numbers followed by "y" or "m" => (\\b\\d+[ym]\\b)所有数字后跟“y”或“m”=> (\\b\\d+[ym]\\b)
  2. and then any number (integer or not) if it appears (meaning zero or more times)=> (\\b[0-9]+. [0-9] \\b)然后是任何数字(整数与否),如果它出现(意味着零次或多次)=> (\\b[0-9]+. [0-9] \\b)

Here is what I did, using Python3 regex and re.findall(), but still got no result这是我所做的,使用 Python3 regex 和 re.findall(),但仍然没有结果

rule2 = re.compile(r"(\b\d+[ym]\b)(\b[0-9]+.*[0-9]*\b)+")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)
# OUT2 >>[]

Why I'm not getting the correct result?为什么我没有得到正确的结果?

You cannot use word boundary twice.您不能两次使用字边界。 Since data is separated by non-letter/digits use \\W+ instead.由于数据由非字母/数字分隔,因此使用\\W+代替。

Then, escape the dot, and make it optional, or you're not going to match 10 .然后,转义点,并将其设为可选,否则您将无法匹配10 Don't use .* as it will match too much (regex greediness)不要使用.*因为它会匹配太多(正则表达式贪婪)

that yields more or less what you're looking for (note that matching strict numbers, integers or floats, is trickier than that, so this isn't perfect):这或多或少会产生您正在寻找的东西(请注意,匹配严格的数字、整数或浮点数比这更棘手,所以这并不完美):

rule2 = re.compile(r"\b(\d+[ym])\W+([0-9]+\.?[0-9]*)\b")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)

[('3y', '15.6'), ('5y', '10')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM