简体   繁体   English

使用Perl将数据转换为数组

[英]Transform data to array with Perl

How do I transform my data to an array with Perl? 如何使用Perl将数据转换为数组?

Here is my data: 这是我的数据:

my $data =
  "203.174.38.128203.174.38.129203.174.38.1" .
  "30203.174.38.131203.174.38.132203.174.38" .
  ".133203.174.38.134173.174.38.135203.174." .
  "38.136203.174.38.137203.174.38.142";

And I want to transform it to be array like this 我想把它变成像这样的数组

my @array= (
  "203.174.38.128",
  "203.174.38.129",
  "203.174.38.130",
  "203.174.38.131",
  "203.174.38.132",
  "203.174.38.133",
  "203.174.38.134",
  "173.174.38.135",
  "203.174.38.136",
  "203.174.38.137",
  "203.174.38.142"
);

Anyone know how to do that with Perl? 任何人都知道如何用Perl做到这一点?

If the first part of IP logged is always 203 , it's kinda easy: 如果记录的IP的第一部分总是203 ,那就很容易:

my @arr = split /(?<=\d)(?=203\.)/, $data;

In the example given it's not, but the first part is always 3-digit, and the second part is always 174 , so it's enough to do... 在示例中,它不是,但第一部分总是3位数,第二部分总是174 ,所以它足以做...

my @arr = split /(?<=\d)(?=\d{3}\.174\.)/, $data;

... to get the correct result. ...以获得正确的结果。

But please understand that it's close to impossible to give a more generic (and bulletproof) solution here - when these 'marker' parts are... too dynamic. 但是请理解,在这里提供更通用(和防弹)的解决方案几乎是不可能的 - 当这些“标记”部分是......过于动态时。 For example, take this string... 例如,拿这个字符串......

11.11.11.22222.11.11.11

The question is, where to split it? 问题是,在哪里拆分? Should it be 11.11.11.22; 222.11.11.11 应该是11.11.11.22; 222.11.11.11 11.11.11.22; 222.11.11.11 ? 11.11.11.22; 222.11.11.11 Or 11.11.11.222; 22.11.11.11 11.11.11.222; 22.11.11.11 11.11.11.222; 22.11.11.11 ? 11.11.11.222; 22.11.11.11 Both are quite valid IPs, if you ask me. 如果你问我,两者都是非常有效的IP。 And it could get even worse, with trying to split '2222' part (can be '2; 222', '22; 22' and even '222; 2'). 它可能会变得更糟,试图分裂'2222'部分(可以是'2; 222','22; 22'甚至'222; 2')。

You can, for example, make a rule: "split each sequence of > 3 digits followed by a dot sign so that the second part of this split would always start from 3 digits": 例如,您可以制定一个规则:“拆分> 3个数字的每个序列后跟一个点符号,以便此拆分的第二部分始终从3个数字开始”:

my @arr = split /(?<=\d)(?=\d{3}\.)/, $data;

... but this will obviously fail to work properly in the ambiguous cases mentioned earlier IF there are IPs with two- or even one-digit first octet in your datastring. ...但是,如果您的数据字符串中存在两个甚至一位数的第一个八位字节的IP,这显然无法在前面提到的模糊情况下正常工作。

If you write a regex that will match any valid value for one of the numbers in the quartet then you can just search for them all and recombine them in sets of four. 如果您编写的正则表达式将匹配四重奏中某个数字的任何有效值,那么您可以只搜索它们并以四个为一组重新组合它们。 This 这个

/2[0-5][0-5]|1\d\d|[1-9]\d|\d/

matches 200-255 or 100-199 or 10-99 or 0-9, and a program to use it is shown below. 匹配200-255或100-199或10-99或0-9,并使用它的程序如下所示。

There is no way to know which option to take if there is more than one way to split the string, and this solution assigns the longest value to the first of the two ip addresses. 如果有多种方法来拆分字符串,则无法知道采用哪个选项,此解决方案将最长的值分配给两个ip地址中的第一个。 For instance, 1.1.1.1234.1.1.1 will split as 1.1.1.123 and 4.1.1.1 例如, 1.1.1.1234.1.1.1将分为1.1.1.1234.1.1.1

use strict;
use warnings;

use feature 'say';

my $data =
  "203.174.38.128203.174.38.129203.174.38.1" .
  "30203.174.38.131203.174.38.132203.174.38" .
  ".133203.174.38.134173.174.38.135203.174." .
  "38.136203.174.38.137203.174.38.142";

my $byte = qr/2[0-5][0-5]|1\d\d|\d\d|\d/;

my @bytes = $data =~ /($byte)/g;
my @addresses;
push @addresses, join('.', splice(@bytes, 0, 4)) while @bytes;

say for @addresses;

output 产量

203.174.38.128
203.174.38.129
203.174.38.130
203.174.38.131
203.174.38.132
203.174.38.133
203.174.38.134
173.174.38.135
203.174.38.136
203.174.38.137
203.174.38.142

Using your sample, it looks like you have 3 digits for the first and last node. 使用您的示例,看起来您的第一个和最后一个节点有3位数。 That would prompt using this pattern: 这会促使使用这种模式:

/(\d{3}\.\d{1,3}\.\d{1,3}\.\d{3})/

Add that with a /g switch and it will pull every one. 使用/g开关添加它,它将拉动每一个。

However, if you have a larger and divergent set of data than what you show for your sample, somebody should have separated the ips before dumping them into this string. 但是,如果您拥有的数据集大于您为示例显示的数据,则有人应该在将ips转储到此字符串之前将其分开。 If they are separate data points, they should have some separation . 如果它们是单独的数据点,它们应该有一些分离

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM