python regex：从字符串中提取数字，未知数字格式

Question

I need to extract the first number from a string, but I don't know the exact format of the number. 我需要从字符串中提取第一个数字，但是我不知道数字的确切格式。

The number could be one of the following formats... 1.224 some decimal... 3,455,000 some number with unknown number of commas... 45% a percentage ... or just an integer 5 该数字可能是以下格式之一... 1.224十进制数... 3,455,000某个数字，逗号数量未知... 45%的百分比...或只是整数5

it would be something like blah blah $ 2,400 or blah blah 45% or blah blah $1.23 or blah blah 7 就像是blah blah $ 2,400或blah blah 45%或blah blah $1.23或blah blah 7

would be interesting if it was intelligent enough to do word numbers too like blah blah seven 如果它足够智能，可以像blah blah seven这样的单词数字，那将很有趣

I don't need the dollar sign, just the number 我不需要美元符号，只需要数字

Answer 1

While this problem has many cases, here is a solution which solves most of them using some regex and the re module: 尽管此问题有很多情况，但以下是一个解决方案，它使用一些正则表达式和re模块解决了大多数问题：

import re

def extractVal(s):
    return re.sub(r'^[^0-9$\-]*| .*$', '', s)

(1) It removes all leading string characters that are not 0-9, or $ （1）删除所有非0-9或$的前导字符串字符

(2) It removes all ending characters up to and including the first space (after (1)) （2）删除所有开头字符，包括第一个空格（在（1）之后）

Here's some data in action: 以下是一些实际数据：

>>> data = ['blah $50,000 10', 'blah -1.224 blah', 'blah 3,455,000 blah', 'blah 45% 10 10 blah', '5 6 4']
>>> print(list(map(extractVal,data)))
['$50,000', '-1.224', '3,455,000', '45%', '5']

This solution assumes that the first number ends in a space. 此解决方案假定第一个数字以空格结尾。

We can go further as others have stated by converting these strings into numbers : 通过将这些字符串转换为数字，我们可以像其他人所说的走得更远：

def valToInt(s):
    if '%' in s:
        a = float(s[:-1])/100
    else:
        a =  float(re.sub(r'[,$]','',s))
    return int(a) if a == int(a) else a

Resulting to (with the map() function again): 结果（再次使用map()函数）：

[50000, -1.224, 3455000, 0.45, 5]

Answer 2

If you insist on a regex, then this should work (only limited to cases you mentioned): 如果您坚持使用正则表达式，那么这应该可以工作（仅限于您提到的情况）：

rgx = re.compile(r'\d+(,|\.)?\d*')
assert rgx.search("blah blah $ 2,400")
assert rgx.search("blah blah 45%")
assert rgx.search("blah blah $1.23")
assert rgx.search("blah blah 7")

As for the blah blah seven I do not thing a regex would cut it (at least not for anything more complex than a single digit). 至于blah blah seven我不认为正则表达式会减少它（至少不是比一位数字更复杂的东西）。

Answer 3

For extracting the first number from a string, with different formats, you could use re.findall() : 要从具有不同格式的字符串中提取第一个数字，可以使用re.findall() ：

 import re

strings = ['45% blah 43%', '1.224 blah 3.2', '3,455,000 blah 4,3', '$1.2 blah blah $ 2,400', '3 blah blah 7']

for string in strings:
    first_match = re.findall(r'[0-9$,.%]+\d*', string)[0]
    print(first_match)

Which Outputs: 哪些输出：

45%
1.224
3,455,000
$1.2
3

Answer 4

Assuming you want an actual number, and that percents should be converted to a decimal: 假设您需要一个实际数字，并且该百分比应转换为小数：

str_ = "blah blah $ 2,400"
number, is_percent = re.search(r"([0-9,.]+)\s*(%?)", str_).groups() or (None, None)
if number is not None:
    number = float(number.replace(",", ""))
    if is_percent:
        number /= 100

python regex：从字符串中提取数字，未知数字格式

问题描述

4 个解决方案

解决方案1
2 2018-06-30 06:03:44

解决方案2
1 2018-06-30 06:03:37

解决方案3
1 已采纳 2018-06-30 06:03:47

解决方案4
1 2018-06-30 06:05:24

python regex：从字符串中提取数字，未知数字格式

问题描述

4 个解决方案

解决方案1 2 2018-06-30 06:03:44

解决方案2 1 2018-06-30 06:03:37

解决方案3 1 已采纳 2018-06-30 06:03:47

解决方案4 1 2018-06-30 06:05:24

解决方案1
2 2018-06-30 06:03:44

解决方案2
1 2018-06-30 06:03:37

解决方案3
1 已采纳 2018-06-30 06:03:47

解决方案4
1 2018-06-30 06:05:24