简体   繁体   English

SimpleDateFormat宽大导致意外行为

[英]SimpleDateFormat leniency leads to unexpected behavior

I have found that SimpleDateFormat::parse(String source) 's behavior is (unfortunatelly) defaultly set as lenient: setLenient(true) . 我发现SimpleDateFormat::parse(String source)的行为(不幸地)默认设置为setLenient(true)setLenient(true)

By default, parsing is lenient: If the input is not in the form used by this object's format method but can still be parsed as a date, then the parse succeeds. 默认情况下,解析是宽松的:如果输入不是此对象的格式方法使用的形式,但仍可以解析为日期,则解析成功。

If I set the leniency to false , the documentation said that with strict parsing, inputs must match this object's format. 如果我将leniency设置为false ,那么文档说通过严格的解析,输入必须匹配此对象的格式。 I have used paring with SimpleDateFormat without the lenient mode and by mistake, I had a typo in the date (letter o instead of number 0 ). 我已经使用了SimpleDateFormat与没有宽松模式的配对,并且错误地,我在日期中输入了一个拼写错误(字母o而不是数字0 )。 (Here is the brief working code:) (这是简要的工作代码:)

// PASSED (year 199)
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.199o"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.199o"));        //WTF?

In my surprise, this has passed and no ParseException has been thrown. 令我惊讶的是,这已经过去了,并且没有抛出ParseException I'd go further: 我走得更远:

// PASSED (year 1990)
String string = "just a String to mess with SimpleDateFormat";

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("03.12.1990" + string));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("03.12.1990" + string));

Let's go on: 我们继续:

// FAILED on the 2nd line
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.mm.yyyy");
System.out.println(simpleDateFormat.parse("o3.12.1990"));
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("o3.12.1990"));

Finally, the exception is thrown: Unparseable date: "o3.12.1990" . 最后,抛出异常: Unparseable date: "o3.12.1990" I wonder where is the difference in the leniency and why the last line of my first code snippet has not thrown an exception? 我想知道宽容的区别在哪里以及为什么我的第一个代码片段的最后一行没有抛出异常? The documentation says: 文件说:

With strict parsing, inputs must match this object's format. 通过严格的解析,输入必须与此对象的格式匹配。

My input clearly doesn't strictly match the format - I expect this parsing to be really strict. 我的输入显然与格式不完全匹配 - 我希望这种解析非常严格。 Why does this (not) happen? 为什么会这样(不)发生?

Leniency is not about whether the entire input matches but whether the format matches. 宽容不是关于整个输入是否匹配,而是格式是否匹配。 Your input can still be 3.12.1990somecrap and it would work. 你的输入仍然可以是3.12.1990somecrap ,它会工作。

The actual parsing is done in parse(String, ParsePosition) which you could use as well. 实际的解析是在parse(String, ParsePosition) ,您也可以使用它。 Basically parse(String) will pass a ParsePosition that is set up to start at index 0 and when the parsing is done the current index of that position is checked. 基本上, parse(String)将传递一个ParsePosition ,该ParsePosition被设置为从索引0开始,并且在完成解析时,将检查该位置的当前索引。

If it's still 0 the start of the input didn't match the format, not even in lenient mode. 如果它仍为0,则输入的开始与格式不匹配,即使在宽松模式下也是如此。

However, to the parser 03.12.199 is a valid date and hence it stops at index 8 - which isn't 0 and thus the parsing succeeded. 但是,解析器03.12.199是一个有效的日期,因此它在索引8处停止 - 它不是0,因此解析成功。 If you want to check whether everything was parsed you'd have to pass your own ParsePosition and check whether the index is matches to the length of the input. 如果要检查是否所有内容都已解析,则必须传递自己的ParsePosition并检查索引是否与输入的长度匹配。

If you use setLenient(false) it will still parse the date till the desired pattern is meet. 如果使用setLenient(false)它仍会解析日期,直到达到所需的模式。 However, it will check the output date is a valid date or not. 但是,它会检查输出日期是否为有效日期。 In your case, 03.12.199 is a valid date, so it will not throw an exception. 在您的情况下, 03.12.199是一个有效的日期,因此它不会抛出异常。 Lets take an example to understand where the setLenient(false) different from setLenient(true)/default . 让我们举个例子来了解setLenient(false)setLenient(true)/default不同之处。

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy"); 
System.out.println(simpleDateFormat.parse("31.02.2018"));

The above will give me output: Sat Mar 03 00:00:00 IST 2018 以上将给我输出: Sat Mar 03 00:00:00 IST 2018

But the below code throw ParseException as 31.02.2018 is not a valid/possible date: 但下面的代码抛出ParseException为31.02.2018不是有效/可能的日期:

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("dd.MM.yyyy");
simpleDateFormat.setLenient(false);
System.out.println(simpleDateFormat.parse("31.02.2018"));

Why does this (not) happen? 为什么会这样(不)发生?

It's not very well explained in the documentation. 在文档中没有很好地解释它。

With lenient parsing, the parser may use heuristics to interpret inputs that do not precisely match this object's format. 通过宽松的解析,解析器可以使用启发式来解释与该对象的格式不完全匹配的输入。 With strict parsing, inputs must match this object's format. 通过严格的解析,输入必须与此对象的格式匹配。

The documentation does help a bit, though, by mentioning that it is the Calendar object that the DateFormat uses that is lenient. 但是,文档确实有点帮助,提到它是DateFormat使用的Calendar对象,它是宽松的。 That Calendar object is not used for the parsing itself, but for interpreting the parsed values into a date and time (I am quoting DateFormat documentation since SimpleDateFormat is a subclass of DateFormat ). Calendar对象不用于解析本身,而是用于将解析后的值解释为日期和时间(我引用了DateFormat文档,因为SimpleDateFormatDateFormat的子类)。

  • SimpleDateFormat , no matter if lenient or not, will accept 3-digit year, for example 199 , even though you have specified yyyy in the format pattern string. SimpleDateFormat ,无论是否宽松,都会接受3位数年份,例如199 ,即使你已经在格式模式字符串中指定了yyyy The documentation says about year: 文档说明了一年:

    For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. 对于解析,如果模式字母的数量大于2,则无论数字位数如何,都按字面解释年份。 So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 AD 所以使用“MM / dd / yyyy”模式,“01/11/12”解析到公元12年1月11日

  • DateFormat , no matter if lenient or not, accepts and ignores text after the parsed text, like the small letter o in your first example. DateFormat ,无论是否宽松,都接受并忽略解析后的文本后的文本,如第一个示例中的小写字母o It objects to unexpected text before or inside the text, as when in your last example you put the letter o in front. 它会在文本之前或之内反对意外文本,就像在上一个示例中将字母o放在前面一样。 The documentation of DateFormat.parse says: DateFormat.parse的文档说:

    The method may not use the entire text of the given string. 该方法可能不使用给定字符串的整个文本。

  • As I indirectly said, leniency makes a difference when interpreting the parsed values into a date and time. 正如我间接所说,在将解析后的值解释为日期和时间时,宽恕会产生影响。 So a lenient SimpleDateFormat will interpret 29.02.2019 as 01.03.2019 because there are only 28 days in February 2019. A strict SimpleDateFormat will refuse to do that and will throw an exception. 因此,宽松的SimpleDateFormat会将29.02.2019解释为01.03.2019,因为2019年2月只有28天。严格的SimpleDateFormat将拒绝执行此操作并将抛出异常。 The default lenient behaviour can lead to very surprising and downright inexplicable results. 默认的宽松行为可能导致非常令人惊讶和彻头彻尾的莫名其妙的结果。 As a simple example, giving the day, month and year in the wrong order: 1990.03.12 will result in August 11 year 17 AD (2001 years ago). 举个简单的例子,给出错误顺序的日,月和年: 1990.03.12将导致公元8月11日17年(2001年前)。

The solution 解决方案

VGR already in a comment mentioned LocalDate from java.time , the modern Java date and time API. VGR已在评论中提到java.time中的LocalDate ,即现代Java日期和时间API。 In my experience java.time is so much nicer to work with than the old date and time classes, so let's give it a shot. 根据我的经验, java.time比旧的日期和时间类更好用,所以让我们java.time Try a correct date string first: 首先尝试正确的日期字符串:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.mm.yyyy");
    System.out.println(LocalDate.parse("03.12.1990", dateFormatter));

We get: 我们得到:

java.time.format.DateTimeParseException: Text '03.12.1990' could not be parsed: Unable to obtain LocalDate from TemporalAccessor: {Year=1990, DayOfMonth=3, MinuteOfHour=12},ISO of type java.time.format.Parsed java.time.format.DateTimeParseException:无法解析Text '03 .12.1990':无法从TemporalAccessor获取LocalDate:{Year = 1990,DayOfMonth = 3,MinuteOfHour = 12},ISO类型为java.time.format.Parsed

This is because I used your format pattern string of dd.mm.yyyy , where lowercase mm means minute. 这是因为我使用了dd.mm.yyyy格式模式字符串,其中小写mm表示分钟。 When we read the error message closely enough, it does state that the DateTimeFormatter interpreted 12 as minute of hour, which was not what we intended. 当我们仔细阅读错误消息时,它确实声明DateTimeFormatter将12解释为小时,这不是我们想要的。 While SimpleDateFormat tacitly accepted this (even when strict), java.time is more helpful in pointing out our mistake. 虽然SimpleDateFormat默认接受了这一点(即使是严格的),但java.time更有助于指出我们的错误。 What the message only indirectly says is that it is missing a month value. 这个消息只是间接地说是缺少一个月的价值。 We need to use uppercase MM for month. 我们需要使用大写的MM月份。 At the same time I am trying your date string with the typo: 与此同时,我正在尝试使用拼写错误的日期字符串:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("dd.MM.yyyy");
    System.out.println(LocalDate.parse("03.12.199o", dateFormatter));

We get: 我们得到:

java.time.format.DateTimeParseException: Text '03.12.199o' could not be parsed at index 6 java.time.format.DateTimeParseException:无法在索引6处解析文本'。03 .12.199o'

Index 6 is where is says 199 . 指数6是199 It objects because we had specified 4 digits and are only supplying 3. The docs say: 它反对,因为我们指定了4个数字并且只提供3个。文档说:

The count of letters determines the minimum field width … 字母数决定了最小字段宽度......

It would also object to unparsed text after the date. 它也会在日期之后反对未解析的文本。 In short it seems to me that it gives you everything that you had expected. 简而言之,在我看来,它为您提供了您所期望的一切。

Links 链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM