简体   繁体   English

Perl或Python:将日期从dd / mm / yyyy转换为yyyy-mm-dd

[英]Perl or Python: Convert date from dd/mm/yyyy to yyyy-mm-dd

I have lots of dates in a column in a CSV file that I need to convert from dd/mm/yyyy to yyyy-mm-dd format. 我在CSV文件的列中有很多日期,我需要将其从dd / mm / yyyy转换为yyyy-mm-dd格式。 For example 17/01/2010 should be converted to 2010-01-17. 例如,17/01/2010应转换为2010-01-17。

How can I do this in Perl or Python? 我怎么能用Perl或Python做到这一点?

If you are guaranteed to have well-formed data consisting of nothing else but a singleton date in the DD-MM-YYYY format, then this works: 如果你保证有完整的数据只包含DD-MM-YYYY格式的单例日期,那么这有效:

# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));

That works on a $date holding "07/04/1776" but fails on "this 17/01/2010 and that 01/17/2010 there". 这适用于持有“07/04/1776”的$date但在“这个17/01/2010和那个01/17/2010那里”失败了。 Instead, use: 相反,使用:

# SECOND METHOD
($ndate = $date) =~ s{
    \b
      ( \d \d   )
    / ( \d \d   )
    / ( \d {4}  )
    \b
}{$3-$2-$1}gx;

If you prefer a more "grammatical" regex, so that it's easier to maintain and update, you can instead use this: 如果您更喜欢更“语法”的正则表达式,以便更容易维护和更新,您可以改为使用:

# THIRD METHOD
($ndate = $date) =~ s{
    (?&break)

              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )

    (?&break)

    (?(DEFINE)
        (?<break> \b     )
        (?<slash> /      )
        (?<year>  \d {4} )
        (?<month> \d {2} )
        (?<day>   \d {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

Finally, if you have Unicode data, you might want to be a bit more careful. 最后,如果您有Unicode数据,您可能需要更加小心。

# FOURTH METHOD
($ndate = $date) =~ s{
    (?&break_before)
              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )
    (?&break_after)

    (?(DEFINE)
        (?<slash>     /                  )
        (?<start>     \A                 )
        (?<finish>    \z                 )

        # don't really want to use \D or [^0-9] here:
        (?<break_before>
           (?<= [\pC\pP\pS\p{Space}] )
         | (?<= \A                )
        )
        (?<break_after>
            (?= [\pC\pP\pS\p{Space}]
              | \z
            )
        )
        (?<digit> \d            )
        (?<year>  (?&digit) {4} )
        (?<month> (?&digit) {2} )
        (?<day>   (?&digit) {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

You can see how each of these four approaches performs when confronted with sample input strings like these: 您可以看到这四种方法在面对如下样本输入字符串时如何执行:

my $sample  = q(17/01/2010);
my @strings =  (
    $sample,  # trivial case

    # multiple case
    "this $sample and that $sample there",

    # multiple case with non-ASCII BMP code points
    # U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
    "from \x{201c}$sample\x{201d} through\xA0$sample",

    # multiple case with non-ASCII code points
    #   from both the BMP and the SMP 
    # code point U+02013 is EN DASH, props \pP \p{Pd}
    # code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
    # code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
    "\x{10179}$sample\x{2013}\x{110BD}$sample",
);

Now letting $date be a foreach iterator through that array, we get this output: 现在让$date成为通过该数组的foreach迭代器,我们得到这个输出:

Original is:   17/01/2010
First method:  2010-01-17
Second method: 2010-01-17
Third method:  2010-01-17
Fourth method: 2010-01-17

Original is:   this 17/01/2010 and that 17/01/2010 there
First method:  2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method:  this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there

Original is:   from “17/01/2010” through 17/01/2010
First method:  2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method:  from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17

Original is:   𐅹17/01/2010–𑂽17/01/2010
First method:  2010-01-2010–𑂽17-01-𐅹17
Second method: 𐅹2010-01-17–𑂽2010-01-17
Third method:  𐅹2010-01-17–𑂽2010-01-17
Fourth method: 𐅹2010-01-17–𑂽2010-01-17

Now let's suppose that you actually do want to match non-ASCII digits. 现在,让我们假设其实想匹配非ASCII数字。 For example: 例如:

   U+660  ARABIC-INDIC DIGIT ZERO
   U+661  ARABIC-INDIC DIGIT ONE
   U+662  ARABIC-INDIC DIGIT TWO
   U+663  ARABIC-INDIC DIGIT THREE
   U+664  ARABIC-INDIC DIGIT FOUR
   U+665  ARABIC-INDIC DIGIT FIVE
   U+666  ARABIC-INDIC DIGIT SIX
   U+667  ARABIC-INDIC DIGIT SEVEN
   U+668  ARABIC-INDIC DIGIT EIGHT
   U+669  ARABIC-INDIC DIGIT NINE

or even 甚至

 U+1D7F6  MATHEMATICAL MONOSPACE DIGIT ZERO
 U+1D7F7  MATHEMATICAL MONOSPACE DIGIT ONE
 U+1D7F8  MATHEMATICAL MONOSPACE DIGIT TWO
 U+1D7F9  MATHEMATICAL MONOSPACE DIGIT THREE
 U+1D7FA  MATHEMATICAL MONOSPACE DIGIT FOUR
 U+1D7FB  MATHEMATICAL MONOSPACE DIGIT FIVE
 U+1D7FC  MATHEMATICAL MONOSPACE DIGIT SIX
 U+1D7FD  MATHEMATICAL MONOSPACE DIGIT SEVEN
 U+1D7FE  MATHEMATICAL MONOSPACE DIGIT EIGHT
 U+1D7FF  MATHEMATICAL MONOSPACE DIGIT NINE

So imagine you have a date in mathematical monospace digits, like this: 所以假设你有一个数学等宽数字的日期,如下所示:

$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";

The Perl code will work just fine on that: Perl代码可以正常工作:

Original is:   𝟷𝟽/𝟷𝟶/𝟸𝟶𝟷𝟶
First method:  𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Second method: 𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Third method:  𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽
Fourth method: 𝟸𝟶𝟷𝟶-𝟷𝟶-𝟷𝟽

I think you'll find that Python has a pretty brain‐damaged Unicode model whose lack of support for abstract characters and strings irrespective of content makes it ridiculously difficult to write things like this. 我想你会发现Python有一个相当大脑损坏的Unicode模型,它缺乏对抽象字符和字符串的支持,无论内容如何都会让写这样的东西变得非常困难。

It's also tough to write legible regular expressions in Python where you decouple the declaration of the subexpressions from their execution, since (?(DEFINE)...) blocks are not supported there. 在Python中编写清晰的正则表达式也是很困难的,你可以将子表达式的声明与它们的执行分离,因为那里不支持(?(DEFINE)...)块。 Heck, Python doesn't even support Unicode properties. 哎呀,Python甚至不支持Unicode属性。 It's just not suitable for Unicode regex work because of this. 由于这个原因,它不适合Unicode正则表达式工作。

But hey, if you think that's bad in Python compared to Perl ( and it certainly is ), just try any other language. 但是,嘿,如果你认为与Perl相比在Python中很糟糕( 当然也是如此 ), 那就试试其他任何语言吧。 I haven't found one that isn't still worse for this sort of work. 我没有找到一个对这类工作来说还不差的人。

As you see, you run into real problems when you ask for regex solutions from multiple languages. 如您所见,当您要求使用多种语言的正则表达式解决方案时,您会遇到实际问题。 First of all, the solutions are difficult to compare because of the different regex flavors. 首先,由于不同的正则表达风味,难以比较解决方案。 But also because no other language can compare with Perl for power, expressivity, and maintainability in its regular expressions. 但也因为没有其他语言可以与Perl在正则表达式中的功能,表现力和可维护性进行比较。 This may become even more obvious once arbitrary Unicode enters the picture. 一旦任意Unicode进入图片,这可能会变得更加明显。

So if you just wanted Python, you should have asked for only that. 所以,如果你只是想要Python,你应该只要求它。 Otherwise it's a terribly unfair contest that Python will nearly always lose; 否则,这将是一场非常不公平的比赛,Python几乎总会失败; it's just too messy to get things like this correct in Python, let alone both correct and clean . 它只是太乱来获得这样的事情正确的Python,更不用说正确 和干净 That's asking more of it than it can produce. 这比它能产生的要多得多。

In contrast, Perl's regexes excel at both those. 相比之下,Perl的正则表达在这两方面都表现出色。

>>> from datetime import datetime
>>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d')
'2010-11-02'

or more hackish way (that doesn't check for validity of values): 或更多的hackish方式(不检查值的有效性):

>>> '-'.join('02/11/2010'.split('/')[::-1])
'2010-11-02'
>>> '-'.join(reversed('02/11/2010'.split('/')))
'2010-11-02'

Use Time::Piece (in core since 5.9.5), very similar to the Python solution accepted, as it provides the strptime and strftime functions: 使用Time :: Piece(自5.9.5以来的核心),与接受的Python解决方案非常相似,因为它提供了strptime和strftime函数:

use Time::Piece;
my $dt_str = Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');

or 要么

$ perl -MTime::Piece
print Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
1979-10-13
$ 

Go with Perl: the datetime Python package is just broken. 使用Perl: datetime Python包刚刚破解。 You could just do it with regexes to swap the date parts around, eg 您可以使用正则表达式来交换周围的日期部分,例如

echo "17/01/2010" | perl -pe 's{(\d+)/(\d+)/(\d+)}{$3-$2-$1}g'

If you do need to parse these dates (eg to compute their day of week or other calendar-type operations), look into DateTimeX::Easy (you can install it with apt-get under Ubuntu): 如果你确实需要解析这些日期(例如计算他们的星期几或其他日历类型的操作),请查看DateTimeX :: Easy (您可以在Ubuntu下使用apt-get安装它):

perl -MDateTimeX::Easy -e 'print DateTimeX::Easy->parse("17/01/2010")->ymd("-")'

Perl : Perl:

while (<>) {
  s/(^|[^\d])(\d\d)\/(\d\d)\/(\d{4})($|[^\d])/$4-$3-$2/g;
  print $_;
}

Then you just have to run: 然后你只需要运行:

perl MyScript.pl < oldfile.txt > newfile.txt

Perl的:

my $date =~ s/(\d+)\/(\d+)\/(\d+)/$3-$2-$1/;

In Perl you can do: 在Perl中你可以做到:

use strict;
while(<>) {
    chomp;
    my($d,$m,$y) = split/\//;
    my $newDate = $y.'-'.$m.'-'.$d;
}

In glorious perl-oneliner form: 以光荣的perl-oneliner形式:

echo 17/01/2010 | perl -p -e "chomp;  join('-', reverse split /\//);"

But seriously I would do it like this: 但严肃地说,我会这样做:

#!/usr/bin/env perl
while (<>) {
    chomp;
    print join('-', reverse split /\//), "\n";
}

Which will work on a pipe, converting and printing one date per line. 这将适用于管道,每行转换和打印一个日期。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM