[英]Perl parsing online bank statement wells fargo
I am parsing out the Wells Fargo online banking statement. 我正在解析富国银行的网上银行对账单。 The card number is truncated and the number starting with a P or an S is a unique transaction identifier. 卡号被截断,以P或S开头的号码是唯一的事务标识符。 I changed out the unique identifier and last four digits of card number. 我更改了唯一标识符和卡号的后四位数字。 I consider myself safe from any privacy concerns. 我认为自己不受任何隐私问题的影响。 I parse, or filter each line through a long ugly regex - which I am not sure you are supposed to do. 我解析,或通过一个长丑陋的正则表达式过滤每一行 - 我不确定你应该做什么。 There might be a better way, I don't know. 可能有更好的方法,我不知道。
#!/usr/bin/perl
use strict;
use warnings;
#my $filename = 'wellsfargo_balanceStatement.txt';
#open(my $fh, '<:encoding(UTF-8)', $filename)
# or die "Could not open file '$filename' $!";
#while (my $row = <$fh>) {
while (my $line = <DATA>) {
chomp $row;
if ($row =~ /(\d{2}\/\d{2}\/\d{2})\s(PURCHASE).*\d{2}\/\d{2}(.*)\w\d{10}\d+.*\$(\d+\.\d+)/) {
my $date=$1;
my $purchs=$2;
my $pur_plce=$3;
my $pur_amt=$4;
print "$date $purchs $pur_plce $pur_amt\n";
}
sleep .5 ;
}
__DATA__
09/18/17 PURCHASE AUTHORIZED ON 09/17 CVS/PHARM 06062--200 W Manhattan NY P00000000032583371 CARD 4184 $4.87
09/18/17 PURCHASE AUTHORIZED ON 09/16 JUBILEE MARKET NEW YORK NY S467259862756690 CARD 4184 $8.78
09/18/17 PURCHASE AUTHORIZED ON 09/16 LOWE'S #3292 NEW YORK NY P00307259724475616 CARD 6029 $23.39
09/18/17 PURCHASE AUTHORIZED ON 09/16 NYSC WEST END NEW YORK NY S587259513187673 CARD 4184 $39.00
09/18/17 PURCHASE AUTHORIZED ON 09/16 JUBILEE MARKET NEW YORK NY S587259468801533 CARD 4184 $21.73
09/18/17 PURCHASE AUTHORIZED ON 09/15 7-ELEVEN NEW YORK NY P00000000840668487 CARD 4184 $12.75
09/18/17 PURCHASE AUTHORIZED ON 09/15 DUNKIN #351200 NEW YORK NY S307258635156794 CARD 4184 $2.82
09/18/17 PURCHASE AUTHORIZED ON 09/15 DUNKIN #351200 NEW YORK NY S587258634843803 CARD 4184 $2.82
The transaction identifier beginning with the consonant S or P which is followed by an number anywhere from 10 to 17 digits. 事务标识符以辅音S或P开头,后跟10到17位数字。 I thought I was pretty crafty with the regex \\w\\d{10}\\d+ however when the first consonant of the unique identifier is a 'P' it prints out 5 characters, and when it is an 'S' it prints out 3. Frankly, I don't even want the unique identifier, and do not know how it is getting in there. 我认为我对正则表达式\\ w \\ d {10} \\ d +非常狡猾但是当唯一标识符的第一个辅音是'P'时它打印出5个字符,当它是'S'时它打印出3坦率地说,我甚至不想要唯一的标识符,也不知道它是如何进入那里的。
09/18/17 PURCHASE CVS/PHARM 06062--200 W Manhattan NY P00000 4.87
09/18/17 PURCHASE JUBILEE MARKET NEW YORK NY S467 8.78
09/18/17 PURCHASE LOWE'S #3292 NEW YORK NY P00307 23.39
09/18/17 PURCHASE NYSC WEST END NEW YORK NY S587 39.00
09/18/17 PURCHASE JUBILEE MARKET NEW YORK NY S587 21.73
09/18/17 PURCHASE 7-ELEVEN NEW YORK NY P00000 12.75
09/18/17 PURCHASE DUNKIN #351200 NEW YORK NY S307 2.82
09/18/17 PURCHASE DUNKIN #351200 NEW YORK NY S587 2.82
Eventually I am going to add comma delimit the file and enter it into excel. 最后我要添加逗号分隔文件并将其输入到excel中。 So I can create bar graphs, circle charts and whatever. 所以我可以创建条形图,圆图等等。 This is what I want 这就是我要的
09/18/17 PURCHASE CVS/PHARM 06062--200 W Manhattan NY 4.87
09/18/17 PURCHASE JUBILEE MARKET NEW YORK NY 8.78
09/18/17 PURCHASE LOWE'S #3292 NEW YORK NY 23.39
09/18/17 PURCHASE NYSC WEST END NEW YORK NY 39.00
09/18/17 PURCHASE JUBILEE MARKET NEW YORK NY 21.73
09/18/17 PURCHASE 7-ELEVEN NEW YORK NY 12.75
09/18/17 PURCHASE DUNKIN #351200 NEW YORK NY 2.82
09/18/17 PURCHASE DUNKIN #351200 NEW YORK NY 2.82
I'm a big fan of using .*
in regexps as little as possible, because it makes it much easier to read and understand. 我非常喜欢在regexp中使用.*
,因为它使阅读和理解更容易。 Also, you can use other delimiters than /
, which will allow you to use / in your regexp without escaping it. 此外,您可以使用除/
之外的其他分隔符,这将允许您在正则表达式中使用/而不转义它。
I'd suggest something like this: 我建议这样的事情:
if ($row =~ m!(\d\d/\d\d/\d\d) (PURCHASE) \w+ ON \d\d/\d\d (.*) \w+ CARD \d+\s+\$(\d+\.\d+)$!) {
I've also taken the liberty to change \\d{2} to \\d\\d - I find that much easier to read, because my brain doesn't have to go into "hey, there has to be a certain number of \\d here, the { is important" mode. 我也冒昧地改变\\ d {2}到\\ d \\ d - 我觉得这更容易阅读,因为我的大脑不必进入“嘿,必须有一定数量的\\在这里,{是重要的'模式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.