简体   繁体   中英

Transform data to array with Perl

How do I transform my data to an array with Perl?

Here is my data:

my $data =
  "203.174.38.128203.174.38.129203.174.38.1" .
  "30203.174.38.131203.174.38.132203.174.38" .
  ".133203.174.38.134173.174.38.135203.174." .
  "38.136203.174.38.137203.174.38.142";

And I want to transform it to be array like this

my @array= (
  "203.174.38.128",
  "203.174.38.129",
  "203.174.38.130",
  "203.174.38.131",
  "203.174.38.132",
  "203.174.38.133",
  "203.174.38.134",
  "173.174.38.135",
  "203.174.38.136",
  "203.174.38.137",
  "203.174.38.142"
);

Anyone know how to do that with Perl?

If the first part of IP logged is always 203 , it's kinda easy:

my @arr = split /(?<=\d)(?=203\.)/, $data;

In the example given it's not, but the first part is always 3-digit, and the second part is always 174 , so it's enough to do...

my @arr = split /(?<=\d)(?=\d{3}\.174\.)/, $data;

... to get the correct result.

But please understand that it's close to impossible to give a more generic (and bulletproof) solution here - when these 'marker' parts are... too dynamic. For example, take this string...

11.11.11.22222.11.11.11

The question is, where to split it? Should it be 11.11.11.22; 222.11.11.11 11.11.11.22; 222.11.11.11 ? Or 11.11.11.222; 22.11.11.11 11.11.11.222; 22.11.11.11 ? Both are quite valid IPs, if you ask me. And it could get even worse, with trying to split '2222' part (can be '2; 222', '22; 22' and even '222; 2').

You can, for example, make a rule: "split each sequence of > 3 digits followed by a dot sign so that the second part of this split would always start from 3 digits":

my @arr = split /(?<=\d)(?=\d{3}\.)/, $data;

... but this will obviously fail to work properly in the ambiguous cases mentioned earlier IF there are IPs with two- or even one-digit first octet in your datastring.

If you write a regex that will match any valid value for one of the numbers in the quartet then you can just search for them all and recombine them in sets of four. This

/2[0-5][0-5]|1\d\d|[1-9]\d|\d/

matches 200-255 or 100-199 or 10-99 or 0-9, and a program to use it is shown below.

There is no way to know which option to take if there is more than one way to split the string, and this solution assigns the longest value to the first of the two ip addresses. For instance, 1.1.1.1234.1.1.1 will split as 1.1.1.123 and 4.1.1.1

use strict;
use warnings;

use feature 'say';

my $data =
  "203.174.38.128203.174.38.129203.174.38.1" .
  "30203.174.38.131203.174.38.132203.174.38" .
  ".133203.174.38.134173.174.38.135203.174." .
  "38.136203.174.38.137203.174.38.142";

my $byte = qr/2[0-5][0-5]|1\d\d|\d\d|\d/;

my @bytes = $data =~ /($byte)/g;
my @addresses;
push @addresses, join('.', splice(@bytes, 0, 4)) while @bytes;

say for @addresses;

output

203.174.38.128
203.174.38.129
203.174.38.130
203.174.38.131
203.174.38.132
203.174.38.133
203.174.38.134
173.174.38.135
203.174.38.136
203.174.38.137
203.174.38.142

Using your sample, it looks like you have 3 digits for the first and last node. That would prompt using this pattern:

/(\d{3}\.\d{1,3}\.\d{1,3}\.\d{3})/

Add that with a /g switch and it will pull every one.

However, if you have a larger and divergent set of data than what you show for your sample, somebody should have separated the ips before dumping them into this string. If they are separate data points, they should have some separation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM