[英]Python regular expression - extracting float pattern
I am trying to extract a particular "float" from a string, it contains multiple formatted "integers", "floats" and dates.我试图从字符串中提取特定的“浮点数”,它包含多个格式化的“整数”、“浮点数”和日期。 The particular "float" in question is presided by some standardized text.
所讨论的特定“浮动”由一些标准化文本主持。
my_string = """03/14/2019 07:07 AM
💵Soles in mDm : 2864.35⬇
🔶BTC purchase in mdm: 11,202,782.0⬇
"""
I have been able to extract the desired float pattern for, 2864.35
, from my_string
but if this particular float changes in pattern or another float with the same format shows up, my script won't return the desired result我已经能够从
my_string
提取2864.35
所需的浮点模式,但是如果此特定浮点模式发生变化或出现另一个具有相同格式的浮点数,我的脚本将不会返回所需的结果
regex = r"(\d+\.\d+)"
matches = re.findall(regex, my_string)
for match in matches:
print(match)
regex
regex
过滤掉regex
regex
返回Soles
it could be upper/lower caseSoles
可以是大写/小写:
:
What you see bellow are three examples of the same line, the second line in my_string
.您在下面看到的是同一行的三个示例,即
my_string
的第二行。 The regex should be able to return only line number two despite any variations such as soles or Soles尽管有任何变化,例如鞋底或鞋底,正则表达式应该只能返回第二行
Any assistance in editing or re-writing the current regular expression regex
is greatly appreciated非常感谢在编辑或重写当前正则表达式
regex
任何帮助
EDIT - Hmmm... If it has to follow soles
then hopefully this helps编辑 - 嗯...如果它必须跟随
soles
那么希望这会有所帮助
Try these, granted my console can't take the extra characters, but based on your input:试试这些,当然我的控制台不能接受额外的字符,但基于你的输入:
>>> my_string = """03/14/2019 07:07 AM
Soles in mDm : 2864.35
BTC purchase in mdm: 11,202,782.0
Soles in mDm : 2864.35
soles MDM: 2,864.35
Soles in mdm :2,864.355
"""
>>> re.findall('(?i)soles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)
#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']
>>> re.findall('[S|s]oles[\S\s]*?([\d]+[\d,]*\.[\d]+)', my_string)
#Output
['2864.35', '2864.35', '2,864.35', '2,864.355']
If you want to match multiple instances then just add the g
flag other wise it'll only match the single instance.如果你想匹配多个实例,那么只需添加
g
标志,否则它只会匹配单个实例。 REGEX正则表达式
(?<=:)\s?([\d,]*\.\d+)
With Python,使用 Python,
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(?<=:)\s?([\d,]*\.\d+)"
test_str = ("\n"
" 💵Soles in mDm : 2864.35⬇\n"
" soles MDM: 2,864.35\n"
" Soles in mdm :2,864.355\n")
matches = re.search(regex, test_str, re.IGNORECASE)
if matches:
print ("Match was found at {start}-{end}: {match}".format(start = matches.start(), end = matches.end(), match = matches.group()))
for groupNum in range(0, len(matches.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = matches.start(groupNum), end = matches.end(groupNum), group = matches.group(groupNum)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.