[英]SimpleDateFormat leniency leads to unexpected behavior
I have found that SimpleDateFormat::parse(String source)
's behavior is (unfortunatelly) defaultly set as lenient: setLenient(true)
. 我发现
SimpleDateFormat::parse(String source)
的行为(不幸地)默认设置为setLenient(true)
: setLenient(true)
。
By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds.
默认情况下,解析是宽松的:如果输入不是此对象的格式方法使用的形式,但仍可以解析为日期,则解析成功。
If I set the leniency to false
, the documentation said that with strict parsing, inputs must match this object's format. 如果我将leniency设置为
false
,那么文档说通过严格的解析,输入必须匹配此对象的格式。 I have used paring with SimpleDateFormat
without the lenient mode and by mistake, I had a typo in the date (letter o
instead of number 0
). 我已经使用了
SimpleDateFormat
与没有宽松模式的配对,并且错误地,我在日期中输入了一个拼写错误(字母o
而不是数字0
)。 (Here is the brief working code:) (这是简要的工作代码:)
// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o")); //WTF?
In my surprise, this has passed and no ParseException
has been thrown. 令我惊讶的是,这已经过去了,并且没有抛出
ParseException
。 I'd go further: 我走得更远:
// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
Let's go on: 我们继续:
// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));
Finally, the exception is thrown: Unparseable date: "o3.12.1990"
. 最后,抛出异常:
Unparseable date: "o3.12.1990"
。 I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? 我想知道宽容的区别在哪里以及为什么我的第一个代码片段的最后一行没有抛出异常? The documentation says:
文件说:
With strict parsing, inputs must match this object's format.
通过严格的解析,输入必须与此对象的格式匹配。
My input clearly doesn't strictly match the format - I expect this parsing to be really strict. 我的输入显然与格式不完全匹配 - 我希望这种解析非常严格。 Why does this (not) happen?
为什么会这样(不)发生?
Leniency is not about whether the entire input matches but whether the format matches. 宽容不是关于整个输入是否匹配,而是格式是否匹配。 Your input can still be
3.12.1990somecrap
and it would work. 你的输入仍然可以是
3.12.1990somecrap
,它会工作。
The actual parsing is done in parse(String, ParsePosition)
which you could use as well. 实际的解析是在
parse(String, ParsePosition)
,您也可以使用它。 Basically parse(String)
will pass a ParsePosition
that is set up to start at index 0 and when the parsing is done the current index of that position is checked. 基本上,
parse(String)
将传递一个ParsePosition
,该ParsePosition
被设置为从索引0开始,并且在完成解析时,将检查该位置的当前索引。
If it's still 0 the start of the input didn't match the format, not even in lenient mode. 如果它仍为0,则输入的开始与格式不匹配,即使在宽松模式下也是如此。
However, to the parser 03.12.199
is a valid date and hence it stops at index 8 - which isn't 0 and thus the parsing succeeded. 但是,解析器
03.12.199
是一个有效的日期,因此它在索引8处停止 - 它不是0,因此解析成功。 If you want to check whether everything was parsed you'd have to pass your own ParsePosition
and check whether the index is matches to the length of the input. 如果要检查是否所有内容都已解析,则必须传递自己的
ParsePosition
并检查索引是否与输入的长度匹配。
If you use setLenient(false)
it will still parse the date till the desired pattern is meet. 如果使用
setLenient(false)
它仍会解析日期,直到达到所需的模式。 However, it will check the output date is a valid date or not. 但是,它会检查输出日期是否为有效日期。 In your case,
03.12.199
is a valid date, so it will not throw an exception. 在您的情况下,
03.12.199
是一个有效的日期,因此它不会抛出异常。 Lets take an example to understand where the setLenient(false)
different from setLenient(true)/default
. 让我们举个例子来了解
setLenient(false)
与setLenient(true)/default
不同之处。
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
System.out.println(simpleDateFormat.parse("31.02.2018"));
The above will give me output: Sat Mar 03 00:00:00 IST 2018
以上将给我输出:
Sat Mar 03 00:00:00 IST 2018
But the below code throw ParseException as 31.02.2018
is not a valid/possible date: 但下面的代码抛出ParseException为
31.02.2018
不是有效/可能的日期:
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("31.02.2018"));
Why does this (not) happen?
为什么会这样(不)发生?
It's not very well explained in the documentation. 在文档中没有很好地解释它。
With lenient parsing, the parser may use heuristics to interpret inputs that do not precisely match this object's format.
通过宽松的解析,解析器可以使用启发式来解释与该对象的格式不完全匹配的输入。 With strict parsing, inputs must match this object's format.
通过严格的解析,输入必须与此对象的格式匹配。
The documentation does help a bit, though, by mentioning that it is the Calendar
object that the DateFormat
uses that is lenient. 但是,文档确实有点帮助,提到它是
DateFormat
使用的Calendar
对象,它是宽松的。 That Calendar
object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat
documentation since SimpleDateFormat
is a subclass of DateFormat
). 该
Calendar
对象不用于解析本身,而是用于将解析后的值解释为日期和时间(我引用了DateFormat
文档,因为SimpleDateFormat
是DateFormat
的子类)。
SimpleDateFormat
, no matter if lenient or not, will accept 3-digit year, for example 199
, even though you have specified yyyy
in the format pattern string. SimpleDateFormat
,无论是否宽松,都会接受3位数年份,例如199
,即使你已经在格式模式字符串中指定了yyyy
。 The documentation says about year: 文档说明了一年:
For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits.
对于解析,如果模式字母的数量大于2,则无论数字位数如何,都按字面解释年份。 So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 AD
所以使用“MM / dd / yyyy”模式,“01/11/12”解析到公元12年1月11日
DateFormat
, no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o
in your first example. DateFormat
,无论是否宽松,都接受并忽略解析后的文本后的文本,如第一个示例中的小写字母o
。 It objects to unexpected text before or inside the text, as when in your last example you put the letter o
in front. 它会在文本之前或之内反对意外文本,就像在上一个示例中将字母
o
放在前面一样。 The documentation of DateFormat.parse
says: DateFormat.parse
的文档说:
The method may not use the entire text of the given string.
该方法可能不使用给定字符串的整个文本。
As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. 正如我间接所说,在将解析后的值解释为日期和时间时,宽恕会产生影响。 So a lenient
SimpleDateFormat
will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat
will refuse to do that and will throw an exception. 因此,宽松的
SimpleDateFormat
会将29.02.2019解释为01.03.2019,因为2019年2月只有28天。严格的SimpleDateFormat
将拒绝执行此操作并将抛出异常。 The default lenient behaviour can lead to very surprising and downright inexplicable results. 默认的宽松行为可能导致非常令人惊讶和彻头彻尾的莫名其妙的结果。 As a simple example, giving the day, month and year in the wrong order:
1990.03.12
will result in August 11 year 17 AD (2001 years ago). 举个简单的例子,给出错误顺序的日,月和年:
1990.03.12
将导致公元8月11日17年(2001年前)。
VGR already in a comment mentioned LocalDate
from java.time
, the modern Java date and time API. VGR已在评论中提到
java.time
中的LocalDate
,即现代Java日期和时间API。 In my experience java.time
is so much nicer to work with than the old date and time classes, so let's give it a shot. 根据我的经验,
java.time
比旧的日期和时间类更好用,所以让我们java.time
。 Try a correct date string first: 首先尝试正确的日期字符串:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
System.out.println(LocalDate.parse("03.12.1990", dateFormatter));
We get: 我们得到:
java.time.format.DateTimeParseException: Text '03.12.1990' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type java.time.format.Parsed
java.time.format.DateTimeParseException:无法解析Text '03 .12.1990':无法从TemporalAccessor获取LocalDate:{Year = 1990,DayOfMonth = 3,MinuteOfHour = 12},ISO类型为java.time.format.Parsed
This is because I used your format pattern string of dd.mm.yyyy
, where lowercase mm
means minute. 这是因为我使用了
dd.mm.yyyy
格式模式字符串,其中小写mm
表示分钟。 When we read the error message closely enough, it does state that the DateTimeFormatter
interpreted 12 as minute of hour, which was not what we intended. 当我们仔细阅读错误消息时,它确实声明
DateTimeFormatter
将12解释为小时,这不是我们想要的。 While SimpleDateFormat
tacitly accepted this (even when strict), java.time
is more helpful in pointing out our mistake. 虽然
SimpleDateFormat
默认接受了这一点(即使是严格的),但java.time
更有助于指出我们的错误。 What the message only indirectly says is that it is missing a month value. 这个消息只是间接地说是缺少一个月的价值。 We need to use uppercase
MM
for month. 我们需要使用大写的
MM
月份。 At the same time I am trying your date string with the typo: 与此同时,我正在尝试使用拼写错误的日期字符串:
DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
System.out.println(LocalDate.parse("03.12.199o", dateFormatter));
We get: 我们得到:
java.time.format.DateTimeParseException: Text '03.12.199o' could not be parsed at index 6
java.time.format.DateTimeParseException:无法在索引6处解析文本'。03 .12.199o'
Index 6 is where is says 199
. 指数6是
199
。 It objects because we had specified 4 digits and are only supplying 3. The docs say: 它反对,因为我们指定了4个数字并且只提供3个。文档说:
The count of letters determines the minimum field width …
字母数决定了最小字段宽度......
It would also object to unparsed text after the date. 它也会在日期之后反对未解析的文本。 In short it seems to me that it gives you everything that you had expected.
简而言之,在我看来,它为您提供了您所期望的一切。
DateFormat.setLenient
documentation DateFormat.setLenient
文档 java.time
. java.time
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.