简体   繁体   English

如何使用正则表达式从句子中提取两位数?

[英]How can I extract a two digit from a sentence using regex expression?

I am trying to make a function that only extracts the two digit interger out of a specific regex expression. 我正在尝试制作一个仅从特定正则表达式表达式中提取两位数字整数的函数。

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

    # if there were no matches, return None
    return None

So that when I print 这样我打印时

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

I will get the number 54. If I write the following beneath, i get the whatever character that I put in (.+)...Why won't it work for numbers? 我将得到数字54。如果我在下面写下以下内容,我将得到输入(。+)的任何字符...为什么它不适用于数字?

def extract_number(message_text):
    regex_expression = 'What are the top (.+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        return match.group()

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))

The only problem with both your snippets is that you're not returning the capture-group result of interest, but the overall match: 两个片段的唯一问题是您没有返回感兴趣的捕获组结果,而是整体匹配:

return match.group()

is the same as return match.group(0) , ie, it'll return the overall match , which in your case is the entire input string . return match.group(0) ,即它将返回整体匹配 ,在您的情况下为整个输入字符串

By contrast, you want index 1 , ie, what the 1st capture group - the first subexpression enclosed in (...) , ([0-9]{2}) - matched: 相比之下,您需要索引1 ,即,第一个捕获组 - (...)([0-9]{2})包含的第一个子表达式是什么匹配的:

return match.group(1)

To put it all together: 放在一起:

def extract_number(message_text):
    regex_expression = 'What are the top ([0-9]{2}) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    # (See bottom of this answer for a loop-less alternative.)
    for match in matches:
        return match.group(1)  # index 1 returns what the 1st capture group matched

    # if there were no matches, return None
    return None

message_text= 'What are the top 54 trends on facebook today'
print(extract_number(message_text))

This yields the desired output: 这将产生所需的输出:

54

Note: As @EvanL00 points out, given that only ever 1 match is needed, the use of regex.finditer() with a subsequent for loop that unconditionally returns in the first iteration is unnecessary and may obscure the intent of the code; 注意:正如@ EvanL00所指出的那样,由于只需要进行1个匹配,就不需要将regex.finditer()与随后的for循环一起使用,而该循环在第一个迭代中无条件返回,这是不必要的,并且可能使代码的意图难以理解。 the simpler and clearer approach is: 更简单明了的方法是:

match = regex.search(message_text) # Get first match only.
if match:
    return match.group(1)

This should work for numeric/string: 这应该适用于数字/字符串:

def extract_number(message_text):
    regex_expression = 'What are the top ([a-zA-Z0-9]+) trends on facebook'
    regex= re.compile(regex_expression)
    matches = regex.findall(message_text)
    if matches:
        return matches[0]

message_text= 'What are the top fifty trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top 50 trends on facebook today'
print(extract_number(message_text))
message_text= 'What are the top -- trends on facebook today'
print(extract_number(message_text))

Output: 输出:

fifty
50
None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM