简体   繁体   English

Java 正则表达式不拾取“+”

[英]Java regex not picking up "+"

I will show you my problem.我会告诉你我的问题。 This is using leetcode and I'm trying to create an atoi method.这是使用 leetcode,我正在尝试创建一个 atoi 方法。

public int myAtoi(String s) {
    System.out.println(s.matches("^[^ -0123456789].*")); //this is the regex I am debugging
    if(s.matches("^[^ -0123456789].*")){
        return 0;
    }
    int solution = 0;
    s = s.replaceAll("[^-0123456789.]","");
    solution = 0;
    boolean negative = false;
    
    if(s.charAt(0) == '-'){
        s = s.replaceAll("-","");
        negative = true;
    }
    
    if(s.matches("^[0-9]?[.][0-9]+")){
        s = s.substring(0, s.indexOf('.'));
        System.out.println(s);
    }
    
    for(int i = s.length(); i > 0; i--){
        solution = solution + (s.charAt(s.length() - i) - 48) * (int)Math.pow(10,i - 1);
    }
    
    if(negative) solution = solution * -1;
    
    if(negative && solution > 0) return (int) Math.pow(-2,31);
    if(!negative && solution < 0) return (int) Math.pow(2,31) - 1;
    
    return solution;
}

here is the output section screenshot provided incase I have missed something there but a text description also exists.这是提供的 output 部分屏幕截图,以防我在那里遗漏了一些内容,但也存在文本描述。

enter image description here在此处输入图像描述

When the input is "+-12" the output is supposed to be (int) 0. This is due to the requirement being that "if the string does not start with a number, a space, or a negative sign" we return 0.当输入为“+-12”时,output 应该为 (int) 0。这是因为要求“如果字符串不是以数字、空格或负号开头”,我们将返回 0 .

The line of code whch is supposed to handle this starts at 4 and looks like应该处理这个问题的代码行从 4 开始,看起来像

if(s.matches("^[^ -0123456789].*")){
    return 0;
}

What is wrong with my regex?我的正则表达式有什么问题?

  • We don't really have to use regular expressions for solving this problem, because of the time complexity.由于时间复杂性,我们真的不必使用正则表达式来解决这个问题。

    • for instance, if(s.matches("^[0-9]?[.][0-9]+")){ does not run linearly, runs quadratically due to the lazy quantifier ( ? ).例如, if(s.matches("^[0-9]?[.][0-9]+")){不是线性运行,而是由于惰性量词 ( ? ) 以二次方式运行。
  • We can just loop through once (order of N) and define some statements:我们可以只循环一次(N 的顺序)并定义一些语句:

class Solution {
    public static final int myAtoi(
        String s
    ) {
        s = s.trim();
        char[] characters = s.toCharArray();
        int sign =  1;
        int index = 0;

        if (
            index < characters.length &&
            (characters[index] == '-' || characters[index] == '+')
        ) {
            if (characters[index] == '-') {
                sign = -1;
            }

            ++index;
        }

        int num = 0;
        int bound = Integer.MAX_VALUE / 10;

        while (
            index < characters.length &&
            characters[index] >= '0' &&
            characters[index] <= '9'
        ) {
            final int digit = characters[index] - '0';

            if (num > bound || (num == bound && digit > 7)) {
                return sign == 1 ? Integer.MAX_VALUE : Integer.MIN_VALUE;
            }

            num *= 10;
            num += digit;
            ++index;
        }

        return sign * num;

    }
}
  • Here is a C++ version, if you might be interested:这是一个 C++ 版本,如果您可能有兴趣:
// Most of headers are already included;
// Can be removed;
#include <iostream>
#include <cstdint>
#include <vector>
#include <string>


// The following block might trivially improve the exec time;
// Can be removed;
static const auto imporve_runtime = []() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(NULL);
    std::cout.tie(NULL);
    return 0;
}();

#define MAX INT_MAX
#define MIN INT_MIN
using ValueType = std::int_fast32_t;

struct Solution {
    static const int myAtoi(
        const std::string str
    ) {
        const ValueType len = std::size(str);
        ValueType sign = 1;
        ValueType index = 0;

        while (index < len && str[index] == ' ') {
            index++;
        }

        if (index == len) {
            return 0;
        }

        if (str[index] == '-') {
            sign = -1;
            ++index;

        } else if (str[index] == '+') {
            ++index;
        }

        std::int_fast64_t num = 0;

        while (index < len && num < MAX && std::isdigit(str[index])) {
            ValueType digit = str[index] - '0';
            num *= 10;
            num += digit;
            index++;
        }

        if (num > MAX) {
            return sign == 1 ? MAX : MIN;
        }

        return sign * num;
    }
};

// int main() {
//     std::cout << Solution().myAtoi("words and 987") << "\n";
//     std::cout << Solution().myAtoi("4193 with words") << "\n";
//     std::cout << Solution().myAtoi("   -42") << "\n";
// }

Regarding your question关于你的问题

What is wrong with my regex?我的正则表达式有什么问题?

  • If you'd like to see how a regular expression solution works, maybe this concise Python version would help (also runs on O(N ^ 2) ):如果您想了解正则表达式解决方案的工作原理,也许这个简洁的 Python 版本会有所帮助(也在O(N ^ 2)上运行):
import re

class Solution:
    def myAtoi(self, s: str) -> int:
        MAX, MIN = 2147483647, -2147483648
        DIGIT_PATTERN = re.compile(r'^\s*[+-]?\d+')
        s = re.findall(DIGIT_PATTERN, s)
        try:
            res = int(''.join(s))
        except:
            return 0
        if res > MAX:
            return MAX
        if res < MIN:
            return MIN
        return res
  • We can workaround the expression of ^\s*[+-]?\d+ by dividing it into two subexpressions so that we would be able to get rid of the lazy quantifier and design an order of N solution, yet that would be unnecessary (and is also against the KISS principle ).我们可以通过将^\s*[+-]?\d+的表达式分成两个子表达式来解决这个问题,这样我们就能够摆脱惰性量词并设计一个 N 阶解决方案,但这是不必要的 (并且也违反了KISS 原则)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM