简体   繁体   English

Python正则表达式——提取浮点模式

[英]Python regular expression - extracting float pattern

I am trying to extract a particular "float" from a string, it contains multiple formatted "integers", "floats" and dates.我试图从字符串中提取特定的“浮点数”,它包含多个格式化的“整数”、“浮点数”和日期。 The particular "float" in question is presided by some standardized text.所讨论的特定“浮动”由一些标准化文本主持。

String sample字符串示例

my_string = """03/14/2019 07:07 AM
💵Soles in mDm : 2864.35⬇
🔶BTC purchase in mdm: 11,202,782.0⬇
"""

I have been able to extract the desired float pattern for, 2864.35 , from my_string but if this particular float changes in pattern or another float with the same format shows up, my script won't return the desired result我已经能够从my_string提取2864.35所需的浮点模式,但是如果此特定浮点模式发生变化或出现另一个具有相同格式的浮点数,我的脚本将不会返回所需的结果

regex = r"(\d+\.\d+)"
matches = re.findall(regex, my_string)
for match in matches:
    print(match)
  • It might truncate the desired float because of inconsistent numerical formatting由于数字格式不一致,它可能会截断所需的浮点数
  • It might print two floats because the numerical pattern of an undesired float is too similar to be filtered out by current regular expression regex它可能会打印两个浮点数,因为不需要的浮点数的数字模式太相似而无法被当前的正则表达式regex过滤掉

Desired return from regular expression regex期望从正则表达式regex返回

  • float with a flexible integer-part, sometimes comma is omitted ie. float 具有灵活的整数部分,有时省略逗号,即。 45000.50 other times 45,000.50 45000.50 其他时间 45,000.50
  • unique line identifier: Soles it could be upper/lower case唯一行标识符: Soles可以是大写/小写
  • line identifier: float prefix :行标识符:浮点前缀:
  • it should only return one float它应该只返回一个浮点数

Some variances of desired float in the Second line of the string only仅在字符串的第二行中所需浮点数的一些差异

What you see bellow are three examples of the same line, the second line in my_string .您在下面看到的是同一行的三个示例,即my_string的第二行。 The regex should be able to return only line number two despite any variations such as soles or Soles尽管有任何变化,例如鞋底鞋底,正则表达式应该只能返回第二行

  • 💵Soles in mDm : 2864.35⬇ 💵鞋底 mDm : 2864.35⬇
  • soles MDM: 2,864.35鞋底 MDM:2,864.35
  • Soles in mdm :2,864.355 mdm 中的鞋底:2,864.355

Any assistance in editing or re-writing the current regular expression regex is greatly appreciated非常感谢在编辑或重写当前正则表达式regex任何帮助

EDIT - Hmmm... If it has to follow soles then hopefully this helps编辑 - 嗯...如果它必须跟随soles那么希望这会有所帮助

Try these, granted my console can't take the extra characters, but based on your input:试试这些,当然我的控制台不能接受额外的字符,但基于你的输入:

>>> my_string = """03/14/2019 07:07 AM
Soles in mDm : 2864.35
BTC purchase in mdm: 11,202,782.0
Soles in mDm : 2864.35
soles MDM: 2,864.35
Soles in mdm :2,864.355
"""


>>> re.findall('(?i)soles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']



>>> re.findall('[S|s]oles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)

#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']

If you want to match multiple instances then just add the g flag other wise it'll only match the single instance.如果你想匹配多个实例,那么只需添加g标志,否则它只会匹配单个实例。 REGEX正则表达式

(?<=:)\s?([\d,]*\.\d+)

With Python,使用 Python,

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=:)\s?([\d,]*\.\d+)"

test_str = ("\n"
    "    💵Soles in mDm : 2864.35⬇\n"
    "    soles MDM: 2,864.35\n"
    "    Soles in mdm :2,864.355\n")

matches = re.search(regex, test_str, re.IGNORECASE)

if matches:
    print ("Match was found at {start}-{end}: {match}".format(start = matches.start(), end = matches.end(), match = matches.group()))

    for groupNum in range(0, len(matches.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = matches.start(groupNum), end = matches.end(groupNum), group = matches.group(groupNum)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM