简体   繁体   中英

Perl regex & data extraction/manipulation

I'm not sure where to start with this one... my client gets stock figures from his supplier but they are now being sent in a different format, here is a sample snippet:

[["BLK",[["Black","0F1315"]],[["S","813"],["M","1378"],["L","1119"],["XL","1069"],["XXL","412"],["3XL","171"]]],["BOT",[["Bottle","15451A"]],[["S","226"],["M","425"],["L","772"],["XL","509"],["XXL","163"]]],["BUR",[["Burgundy","73002E"]],[["S","402"],["M","530"],["L","356"],["XL","257"],["XXL","79"]]],["DNA",[["Deep Navy","000F33"]],[["S","699"],["M","1161"],["L","1645"],["XL","1032"],["XXL","350"]]],["EME",[["Emerald","0DAB5E"]],[["S","392"],["M","567"],["L","613"],["XL","431"],["XXL","97"]]],["HEA",[["Heather","C0D4D7"]],[["S","374"],["M","447"],["L","731"],["XL","386"],["XXL","115"],["3XL","26"]]],["KEL",[["Kelly","0FFF00"]],[["S","167"],["M","285"],["L","200"],["XL","98"],["XXL","45"]]],["NAV",[["Navy","002466"]],[["S","451"],["M","1389"],["L","1719"],["XL","1088"],["XXL","378"],["3XL","177"]]],["NPU",[["Purple","560D55"]],[["S","347"],["M","553"],["L","691"],["XL","230"],["XXL","101"]]],["ORA",[["Orange","FF4700"]],[["S","125"],["M","273"],["L","158"],["XL","98"],["XXL","98"]]],["RED",[["Red","FF002E"]],[["S","972"],["M","1186"],["L","1246"],["XL","889"],["XXL","184"]]],["ROY",[["Royal","1500CE"]],[["S","1078"],["M","1346"],["L","1102"],["XL","818"],["XXL","135"]]],["SKY",[["Sky","91E3FF"]],[["S","567"],["M","919"],["L","879"],["XL","498"],["XXL","240"]]],["SUN",[["Sunflower","FFC700"]],[["S","843"],["M","1409"],["L","1032"],["XL","560"],["XXL","53"]]],["WHI",[["White","FFFFFF"]],[["S","631"],["M","2217"],["L","1666"],["XL","847"],["XXL","410"],["3XL","74"]]]]

Firstly the inital [ and end ] can be removed

Then it needs be be broken down into segments of colours, ie:

["BLK",[["Black","0F1315"]],[["S","813"],["M","1378"],["L","1119"],["XL","1069"],["XXL","412"],["3XL","171"]]]

The BLK is needed here, the next block [["Black","0F1315"]] can be disregarded.

Next I need to take the stock data for each size ["S","813"] etc

Therefore I should have a data such as:

 $col = BLK
 $size = S
 $qty = 813

 $col = BLK
 $size = M
 $qty = 1278

and repeat this segment for every colour seqment in the data.

The amount of colour segments in the data will vary, as will the amount of sizing segements within. Also the amount of sizing segments will vary colour to colour, ie there maybe 6 sizes for BLK but only 5 for RED

The data will be written out while in the loop for these so something like print "$col:$size:$qty" will be fine as this would then be in a format ready to be processed.

Sorry for the long message, I just can't seem to get my head round this today!!

Regards,

Stu

This looks like valid JSON to me, why not use a JSON parser instead of trying to solve this with a regex?

use JSON;
my $json_string = '[["BLK",[["Black","0F1315"]],[["S","813"...<snip>';
my $deserialized = from_json( $json_string );

Then you can iterate over the array and extract the pieces of information you need.

Building on Tim Pietzcker's answer :

...
my $deserialized = from_json( $json_string );
foreach my $group ( @$deserialized ) {
    my ( $color, undef, $sizes ) = @$group;
    print join( ":", $color, @$_ ), "\n" for @$sizes;
}

(And yes, for this particular format, eval should do as well as from_json , although the latter is safer. However, you should really try to find an official spec for the format: is it really JSON or something else?)

Assuming you have your data in $str, then eval(EXPR) (Danger Will Robinson!) and process the resulting data structure:

my $struct = eval $str;

foreach my $cref (@$struct) {
    my($color, undef, $sizerefs) = @$cref; # 3 elements in each top level
    foreach my $sizeref (@$sizerefs) {
        my($size, $qty) = @$sizeref;
        print "$color:$size:$qty\n";
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM