简体   繁体   中英

Regex capture groups vs. using split on dates, which is better?

In multiple instances I've found myself needing to capture the "day" part of a date formatted like : "YYYY/DD/MM" My question is in whether or not it was a better idea to use a regex with a capturing group vs. just calling split on "/" and just taking the second item in the array?

Functionally I realize that BOTH get the same result.

Memory-wise, I realize that they BOTH use the regex engine under the hood, and that in most cases I'm just discarding the the array after reading the value for day using split. But technically I do the exact same thing after reading from the Match object as well. I'm looking to see if there's any corner cases and tradeoffs I should be considering that I'm not? (Besides readability... "split" clearly wins there...)

===EDIT=== API-wise I'm limited to Groovy 1.5.0 for silly legacy reasons.

To clarify edmastermind's solution:

def nowCal = Calendar.instance 
def currentDay = nowCal.get(Calendar.DAY_OF_MONTH)

If its formatted the same always, I'd use the substring method of String to pull out what you want. REGEX has to be parsed every time you use it.

The code is more meaningful if you use DateFormat, setDateFormat with the format of the string, then parse for the Date object. Performance wise, there may be more overhead than your methods.

However, I want to point out that:
If you can trust the input String, then you can use whatever you want to get the result.
If you cannot, then just use DateFormat to parse the date string.

In Groovy, you can just do:

def date = '2012/05/25'

assert 25 == Date.parse( 'yyyy/MM/dd', date )[ Calendar.DAY_OF_MONTH ]

For Groovy 1.5.0, this would be (wrapped as a function):

int getDay( String date, String format='yyyy/MM/dd' ) {
  Calendar.instance.with {
    time = new java.text.SimpleDateFormat( format ).parse( date )
    get( Calendar.DAY_OF_MONTH )
  }
}

def date = '2012/05/25'
assert 25 == getDay( date )

For this case you should opt for the solution which yields the most readable and maintainable code, so I would go for the .split approach as opposed to the Regular Expression approach.

Consider a few months down the line and the format of the date changes. This would mean that you would have to make slight changes to the regular expression. Although this might not be a daunting task, it might be something which is not as straightforward for someone who has had a limited exposure to regular expressions.

For this case, the .split("//") in this case is pretty intuitive so it is easier for people who do not have regular expression experience to understand and hence maintain.

If you need to do it with minimal resources, please consider using String#indexOf and String#substring. It is also important to note that doing a benchmark will help you making the right decision.

I think that using a regex is very concise in this matter and communicates that you are only interested in the middle part, the split version is harder to read, because you don't know how many elements split return, unless you encode it in a comment.

Neither the simple regex approach nor the split approach will give you a proper date validation. For that try to consider using the parse function of a DateFormat. Which will probably be the slowest of all.

注意到java标签,调用DAY_OF_MONTH获取“day”部分而不是重新整理或拆分它是不可行的?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM