简体   繁体   English

SimpleDateFormat.parse() 忽略模式中的字符数

[英]SimpleDateFormat.parse() ignores the number of characters in pattern

I'm trying to parse a date String which can have tree different formats.我正在尝试解析一个日期字符串,它可以具有不同的树格式。 Even though the String should not match the second pattern it somehow does and therefore returns a wrong date.即使 String 不应该与第二个模式匹配,它也会以某种方式匹配并因此返回错误的日期。

That's my code:那是我的代码:

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

public class Start {

    public static void main(String[] args) {
        SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
        try{
            System.out.println(sdf.format(parseDate("2013-01-31")));
        } catch(ParseException ex){
            System.out.println("Unable to parse");
        }
    }

    public static Date parseDate(String dateString) throws ParseException{
        SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
        SimpleDateFormat sdf2 = new SimpleDateFormat("dd-MM-yyyy");
        SimpleDateFormat sdf3 = new SimpleDateFormat("yyyy-MM-dd");

        Date parsedDate;
        try {
            parsedDate = sdf.parse(dateString);
        } catch (ParseException ex) {
            try{
                parsedDate = sdf2.parse(dateString);
            } catch (ParseException ex2){
                parsedDate = sdf3.parse(dateString);    
            }
        }
        return parsedDate;
    }
}

With the input 2013-01-31 I get the output 05.07.0036 .随着输入2013-01-31我得到输出05.07.0036

If I try to parse 31-01-2013 or 31.01.2013 I get 31.01.2013 as expected.如果我尝试解析31-01-201331.01.2013我会按预期得到31.01.2013

I recognized that the programm will give me exactly the same output if I set the patterns like this:我认识到如果我设置这样的模式,程序会给我完全相同的输出:

SimpleDateFormat sdf = new SimpleDateFormat("d.M.y");
SimpleDateFormat sdf2 = new SimpleDateFormat("d-M-y");
SimpleDateFormat sdf3 = new SimpleDateFormat("y-M-d");

Why does it ignore the number of chars in my pattern?为什么它会忽略我的模式中的字符数?

There are serious issues with SimpleDateFormat. SimpleDateFormat 存在严重问题。 The default lenient setting can produce garbage answers, and I cannot think of a case where lenient has any benefit.默认的 lenient 设置会产生垃圾答案,我想不出 lenient 有任何好处的情况。 The lenient setting is not a reliable approach to produce reasonable interpretations of human entered date variations.宽松的设置不是对人工输入的日期变化产生合理解释的可靠方法。 This should never have been the default setting.这不应该是默认设置。

Use DateTimeFormatter instead if you can, see Ole VV's answer.如果可以,请改用 DateTimeFormatter,请参阅 Ole VV 的回答。 This newer approach is superior and produces thread safe and immutable instances.这种较新的方法更胜一筹,可以生成线程安全且不可变的实例。 If you share a SimpleDateFormat instance between threads they can produce garbage results without errors or exceptions.如果您在线程之间共享 SimpleDateFormat 实例,它们可以产生无错误或异常的垃圾结果。 Sadly my suggested implementation inherits this bad behavior.可悲的是,我建议的实现继承了这种不良行为。

Disabling lenient is only part of the solution.禁用 lenient 只是解决方案的一部分。 You can still end up with garbage results that are hard to catch in testing.您仍然可能会得到在测试中难以捕捉的垃圾结果。 See the comments in the code below for examples.有关示例,请参阅下面代码中的注释。

Here is an extension of SimpleDateFormat that forces strict pattern match.这是强制严格模式匹配的 SimpleDateFormat 的扩展。 This should have been the default behavior for that class.这应该是该类的默认行为。

import java.text.DateFormatSymbols;
import java.text.ParseException;
import java.text.ParsePosition;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

/**
 * Extension of SimpleDateFormat that implements strict matching.
 * parse(text) will only return a Date if text exactly matches the
 * pattern. 
 * 
 * This is needed because SimpleDateFormat does not enforce strict 
 * matching. First there is the lenient setting, which is true
 * by default. This allows text that does not match the pattern and
 * garbage to be interpreted as valid date/time information. For example,
 * parsing "2010-09-01" using the format "yyyyMMdd" yields the date 
 * 2009/12/09! Is this bizarre interpretation the ninth day of the  
 * zeroth month of 2010? If you are dealing with inputs that are not 
 * strictly formatted, you WILL get bad results. You can override lenient  
 * with setLenient(false), but this strangeness should not be the default. 
 *
 * Second, setLenient(false) still does not strictly interpret the pattern. 
 * For example "2010/01/5" will match "yyyy/MM/dd". And data disagreement like 
 * "1999/2011" for the pattern "yyyy/yyyy" is tolerated (yielding 2011). 
 *
 * Third, setLenient(false) still allows garbage after the pattern match. 
 * For example: "20100901" and "20100901andGarbage" will both match "yyyyMMdd". 
 * 
 * This class restricts this undesirable behavior, and makes parse() and 
 * format() functional inverses, which is what you would expect. Thus
 * text.equals(format(parse(text))) when parse returns a non-null result.
 * 
 * @author zobell
 *
 */
public class StrictSimpleDateFormat extends SimpleDateFormat {

    protected boolean strict = true;

    public StrictSimpleDateFormat() {
        super();
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern) {
        super(pattern);
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern, DateFormatSymbols formatSymbols) {
        super(pattern, formatSymbols);
        setStrict(true);
    }

    public StrictSimpleDateFormat(String pattern, Locale locale) {
        super(pattern, locale);
        setStrict(true);
    }

    /**
     * Set the strict setting. If strict == true (the default)
     * then parsing requires an exact match to the pattern. Setting
     * strict = false will tolerate text after the pattern match. 
     * @param strict
     */
    public void setStrict(boolean strict) {
        this.strict = strict;
        // strict with lenient does not make sense. Really lenient does
        // not make sense in any case.
        if (strict)
            setLenient(false); 
    }

    public boolean getStrict() {
        return strict;
    }

    /**
     * Parse text to a Date. Exact match of the pattern is required.
     * Parse and format are now inverse functions, so this is
     * required to be true for valid text date information:
     * text.equals(format(parse(text))
     * @param text
     * @param pos
     * @return
     */
    @Override
    public Date parse(String text, ParsePosition pos) {
        Date d = super.parse(text, pos);
        if (strict && d != null) {
           String format = this.format(d);
           if (pos.getIndex() + format.length() != text.length() ||
                 !text.endsWith(format)) {
              d = null; // Not exact match
           }
        }
        return d;
    }
}

java.time时间

java.time is the modern Java date and time API and behaves the way you had expected. java.time 是现代 Java 日期和时间 API,其行为方式符合您的预期。 So it's a matter of a simple translation of your code:所以这是一个简单的代码翻译问题:

private static final DateTimeFormatter formatter1 = DateTimeFormatter.ofPattern("dd.MM.yyyy");
private static final DateTimeFormatter formatter2 = DateTimeFormatter.ofPattern("dd-MM-yyyy");
private static final DateTimeFormatter formatter3 = DateTimeFormatter.ofPattern("yyyy-MM-dd");

public static LocalDate parseDate(String dateString) {
    LocalDate parsedDate;
    try {
        parsedDate = LocalDate.parse(dateString, formatter1);
    } catch (DateTimeParseException dtpe1) {
        try {
            parsedDate = LocalDate.parse(dateString, formatter2);
        } catch (DateTimeParseException dtpe2) {
            parsedDate = LocalDate.parse(dateString, formatter3);
        }
    }
    return parsedDate;
}

(I put the formatters outside your method so they are not created anew for each call. You can put them inside if you prefer.) (我将格式化程序放在您的方法之外,因此不会为每次调用重新创建它们。如果您愿意,可以将它们放在里面。)

Let's try it out:让我们试试看:

    LocalDate date = parseDate("2013-01-31");
    System.out.println(date);

Output is:输出是:

2013-01-31 2013-01-31

For numbers DateTimeFormatter.ofPattern takes the number of pattern letters to be the minimum field width.对于数字DateTimeFormatter.ofPattern将模式字母的数量作为最小字段宽度。 It furthermore assumes that the day of month is never more than two digits.此外,它还假定月份中的日期永远不会超过两位数。 So when trying the format dd-MM-yyyy it successfully parsed 20 as a day of month and then threw a DateTimeParseException because there wasn't a hyphen (dash) after 20 .因此,在尝试dd-MM-yyyy格式时,它成功地将20解析为一个月中的某一天,然后抛出DateTimeParseException因为20之后没有连字符(破折号)。 Then the method went on to try the next formatter.然后该方法继续尝试下一个格式化程序。

What went wrong in your code你的代码出了什么问题

The SimpleDateFormat class that you tried to use is notoriously troublesome and fortunately long outdated.您尝试使用的SimpleDateFormat类是出了名的麻烦,幸运的是已经过时了。 You met but one of the many problems with it.你遇到了它的众多问题之一。 Repeating the important sentence from the documentation of how it handles numbers from the answer by Teetoo:从 Teetoo 的答案中重复它如何处理数字的文档中的重要句子:

For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.对于解析,除非需要分隔两个相邻字段,否则将忽略模式字母的数量。

So new SimpleDateFormat("dd-MM-yyyy") happily parses 2013 as the day of month, 01 as the month and 31 as the year.因此new SimpleDateFormat("dd-MM-yyyy")愉快地将2013解析为月份中的第几天,将01解析为月份,将31解析为年份。 Next we should have expected it to throw an exception because there aren't 2013 days in January year 31. But a SimpleDateFormat with default settings doesn't do that.接下来我们应该预料到它会抛出异常,因为在 1 月 31 年没有 2013 天。但是具有默认设置的SimpleDateFormat不会这样做。 It just keeps counting days through the following months and years and ends up at July 5 year 36, five and a half years later, the result you observed.它只是在接下来的几个月和几年中不断计算天数,并在五年半后的 7 月 5 日结束,这是您观察到的结果。

Link关联

Oracle tutorial: Date Time explaining how to use java.time. Oracle 教程:解释如何使用 java.time 的日期时间

A workaround could be to test the yyyy-MM-dd format with a regex:解决方法可能是使用正则表达式测试 yyyy-MM-dd 格式:

public static Date parseDate(String dateString) throws ParseException {
    SimpleDateFormat sdf = new SimpleDateFormat("dd.MM.yyyy");
    SimpleDateFormat sdf2 = new SimpleDateFormat("dd-MM-yyyy");
    SimpleDateFormat sdf3 = new SimpleDateFormat("yyyy-MM-dd");

    Date parsedDate;
    try {
        if (dateString.matches("\\d{4}-\\d{2}-\\d{2}")) {
            parsedDate = sdf3.parse(dateString);
        } else {
            throw new ParseException("", 0);
        }
    } catch (ParseException ex) {
        try {
            parsedDate = sdf2.parse(dateString);
        } catch (ParseException ex2) {
            parsedDate = sdf.parse(dateString);
        }
    }
    return parsedDate;
}

It is documented in the SimpleDateFormat javadoc:它记录在SimpleDateFormat javadoc 中:

For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount.对于格式化,模式字母的数量是最小位数,较短的数字用零填充到这个数量。 For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.对于解析,除非需要分隔两个相邻字段,否则将忽略模式字母的数量。

Thanks @Teetoo.谢谢@Teetoo。 That helped me to find the solution to my problem:这帮助我找到了解决问题的方法:

If I want the parse function to match the pattern exactly I have to set "lenient" ( SimpleDateFormat.setLenient ) of my SimpleDateFormat to false :如果我希望解析函数与模式完全匹配,我必须将SimpleDateFormat.setLenient “lenient”( SimpleDateFormat.setLenient )设置为false

SimpleDateFormat sdf = new SimpleDateFormat("d.M.y");
sdf.setLenient(false);
SimpleDateFormat sdf2 = new SimpleDateFormat("d-M-y");
sdf2.setLenient(false);
SimpleDateFormat sdf3 = new SimpleDateFormat("y-M-d");
sdf3.setLenient(false);

This will still parse the date if I only use one pattern letter for each segment but it will recognize that 2013 can't be the day and therefore it does not match the second pattern.如果我只为每个段使用一个模式字母,这仍然会解析日期,但它会识别 2013 年不能是这一天,因此它与第二个模式不匹配。 In combination with a length check I recive exactly what I want.结合长度检查,我准确地收到了我想要的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM