简体   繁体   English

Perl解析在线银行对账单井

[英]Perl parsing online bank statement wells fargo

I am parsing out the Wells Fargo online banking statement. 我正在解析富国银行的网上银行对账单。 The card number is truncated and the number starting with a P or an S is a unique transaction identifier. 卡号被截断,以P或S开头的号码是唯一的事务标识符。 I changed out the unique identifier and last four digits of card number. 我更改了唯一标识符和卡号的后四位数字。 I consider myself safe from any privacy concerns. 我认为自己不受任何隐私问题的影响。 I parse, or filter each line through a long ugly regex - which I am not sure you are supposed to do. 我解析,或通过一个长丑陋的正则表达式过滤每一行 - 我不确定你应该做什么。 There might be a better way, I don't know. 可能有更好的方法,我不知道。

#!/usr/bin/perl
use strict;
use warnings;

#my $filename = 'wellsfargo_balanceStatement.txt';
#open(my $fh, '<:encoding(UTF-8)', $filename)
#  or die "Could not open file '$filename' $!";

#while (my $row = <$fh>)  {
while (my $line = <DATA>) {
  chomp $row;
  if ($row =~ /(\d{2}\/\d{2}\/\d{2})\s(PURCHASE).*\d{2}\/\d{2}(.*)\w\d{10}\d+.*\$(\d+\.\d+)/) {
      my $date=$1;
      my $purchs=$2;
      my $pur_plce=$3;
      my $pur_amt=$4;
      print "$date $purchs $pur_plce $pur_amt\n";
      }
  sleep .5 ;
}



__DATA__

  09/18/17 PURCHASE AUTHORIZED ON 09/17 CVS/PHARM 06062--200 W Manhattan NY P00000000032583371 CARD 4184   $4.87
  09/18/17 PURCHASE AUTHORIZED ON 09/16 JUBILEE MARKET NEW YORK NY S467259862756690 CARD 4184   $8.78
  09/18/17 PURCHASE AUTHORIZED ON 09/16 LOWE'S #3292 NEW YORK NY P00307259724475616 CARD 6029   $23.39
  09/18/17 PURCHASE AUTHORIZED ON 09/16 NYSC WEST END NEW YORK NY S587259513187673 CARD 4184   $39.00
  09/18/17 PURCHASE AUTHORIZED ON 09/16 JUBILEE MARKET NEW YORK NY S587259468801533 CARD 4184   $21.73
  09/18/17 PURCHASE AUTHORIZED ON 09/15 7-ELEVEN NEW YORK NY P00000000840668487 CARD 4184   $12.75
  09/18/17 PURCHASE AUTHORIZED ON 09/15 DUNKIN #351200 NEW YORK NY S307258635156794 CARD 4184   $2.82
  09/18/17 PURCHASE AUTHORIZED ON 09/15 DUNKIN #351200 NEW YORK NY S587258634843803 CARD 4184   $2.82

The transaction identifier beginning with the consonant S or P which is followed by an number anywhere from 10 to 17 digits. 事务标识符以辅音S或P开头,后跟10到17位数字。 I thought I was pretty crafty with the regex \\w\\d{10}\\d+ however when the first consonant of the unique identifier is a 'P' it prints out 5 characters, and when it is an 'S' it prints out 3. Frankly, I don't even want the unique identifier, and do not know how it is getting in there. 我认为我对正则表达式\\ w \\ d {10} \\ d +非常狡猾但是当唯一标识符的第一个辅音是'P'时它打印出5个字符,当它是'S'时它打印出3坦率地说,我甚至不想要唯一的标识符,也不知道它是如何进入那里的。

09/18/17 PURCHASE  CVS/PHARM 06062--200 W Manhattan NY P00000 4.87
09/18/17 PURCHASE  JUBILEE MARKET NEW YORK NY S467 8.78
09/18/17 PURCHASE  LOWE'S #3292 NEW YORK NY P00307 23.39
09/18/17 PURCHASE  NYSC WEST END NEW YORK NY S587 39.00
09/18/17 PURCHASE  JUBILEE MARKET NEW YORK NY S587 21.73
09/18/17 PURCHASE  7-ELEVEN NEW YORK NY P00000 12.75
09/18/17 PURCHASE  DUNKIN #351200 NEW YORK NY S307 2.82
09/18/17 PURCHASE  DUNKIN #351200 NEW YORK NY S587 2.82

Eventually I am going to add comma delimit the file and enter it into excel. 最后我要添加逗号分隔文件并将其输入到excel中。 So I can create bar graphs, circle charts and whatever. 所以我可以创建条形图,圆图等等。 This is what I want 这就是我要的

09/18/17 PURCHASE  CVS/PHARM 06062--200 W Manhattan NY 4.87
09/18/17 PURCHASE  JUBILEE MARKET NEW YORK NY 8.78
09/18/17 PURCHASE  LOWE'S #3292 NEW YORK NY 23.39
09/18/17 PURCHASE  NYSC WEST END NEW YORK NY 39.00
09/18/17 PURCHASE  JUBILEE MARKET NEW YORK NY 21.73
09/18/17 PURCHASE  7-ELEVEN NEW YORK NY 12.75
09/18/17 PURCHASE  DUNKIN #351200 NEW YORK NY 2.82
09/18/17 PURCHASE  DUNKIN #351200 NEW YORK NY 2.82

I'm a big fan of using .* in regexps as little as possible, because it makes it much easier to read and understand. 我非常喜欢在regexp中使用.* ,因为它使阅读和理解更容易。 Also, you can use other delimiters than / , which will allow you to use / in your regexp without escaping it. 此外,您可以使用除/之外的其他分隔符,这将允许您在正则表达式中使用/而不转义它。

I'd suggest something like this: 我建议这样的事情:

if ($row =~ m!(\d\d/\d\d/\d\d) (PURCHASE) \w+ ON \d\d/\d\d (.*) \w+ CARD \d+\s+\$(\d+\.\d+)$!) {

I've also taken the liberty to change \\d{2} to \\d\\d - I find that much easier to read, because my brain doesn't have to go into "hey, there has to be a certain number of \\d here, the { is important" mode. 我也冒昧地改变\\ d {2}到\\ d \\ d - 我觉得这更容易阅读,因为我的大脑不必进入“嘿,必须有一定数量的\\在这里,{是重要的'模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM