简体   繁体   English

使用正则表达式进行Perl排序

[英]Perl sort with regular expression

I have a perl array of strings like this: 我有一个像这样的perl数组字符串:

my @arr = ( "gene1 (100)", "gene2 (50)", "gene3 (120)", ... );

How can I sort the array by the integer in parentheses? 如何用括号中的整数对数组进行排序?

Using a transform to compare the first number in the string 使用变换来比较字符串中的第一个数字

use strict;
use warnings;

my @arr = ( "gene1 (100)", "gene2 (50)", "gene3 (120)");

my @sorted = map {$_->[0]}
             sort {$a->[1] <=> $b->[1]}
             map {[$_, /\b(\d+)\b/]} @arr;

print "$_\n" for @sorted;

Outputs: 输出:

gene2 (50)
gene1 (100)
gene3 (120)

The sort built-in in Perl lets you pass a code reference as its first argument to define how the sort should be done. Perl中内置sort允许您传递代码引用作为其第一个参数,以定义应如何进行排序。 Inside this code ref, you can use any function you want. 在此代码ref中,您可以使用任何您想要的功能。

Since you want to do it with a regular expression, it makes sense to create a sub that matches the numbers in the parenthesis and use that in your sorting function. 由于您希望使用正则表达式,因此创建与括号中的数字匹配的sub并在排序函数中使用它是有意义的。

You need to call it once for $a and $b , the two variables that will be compared to each other for each round of sorting pairs. 您需要为$a$b调用一次,这两个变量将针对每轮排序对进行相互比较。 You should use the <=> operator , which is used for sorting numbers in ascending order. 您应该使用<=>运算符 ,该运算符用于按升序对数字进行排序。

This is a very verbose version. 这是一个非常详细的版本。

use strict;
use warnings;
use Data::Dump;

my @arr = ( "gene1 (100)", "gene2 (50)", "gene3 (120)",  );

dd sort { get_number($a) <=> get_number($b) } @arr;

sub get_number {
  my ( $string ) = @_;
  return $1 if $string =~ m/\((\d+)\)/;   
  return 0; # assume it goes last if there is no number
}

Output: 输出:

("gene2 (50)", "gene1 (100)", "gene3 (120)")

This shows the straightforward way. 这显示了直截了当的方式。 The sort block sets $aa and $bb to the values of the numbers in $a and $b respectively. sort块组$aa$bb到数字的值$a$b分别。 Then <=> is used to compare them numerically. 然后<=>用于在数字上比较它们。

There is no need for the much more obscure transformation method unless the basic technique proves to be too slow. 除非基本技术证明太慢,否则不需要更加模糊的转换方法。

use strict;
use warnings;
use 5.010;

my @arr = ( "gene1 (100)", "gene2 (50)", "gene3 (120)",  );

my @sorted = sort {
  my ($aa) = $a =~ / \(  (\d+)  \) /x;
  my ($bb) = $b =~ / \(  (\d+)  \) /x;
  $aa <=> $bb;
} @arr;

say for @sorted;

output 产量

gene2 (50)
gene1 (100)
gene3 (120)

The List::UtilsBy CPAN module provides a function, nsort_by which sorts a list of values by sorting into numerical order, the values returned by a block of code on each value. List::UtilsBy CPAN模块提供了一个函数nsort_by ,它通过按数字顺序排序来排序值列表,每个值的代码块返回的值。

In your case, it can be used to extract that number: 在您的情况下,它可用于提取该数字:

use List::UtilsBy 'nsort_by';

@sorted = nsort_by { m/\((\d+)/ and $1 } @strings

This is somewhat more efficient than a regular sort call with code to extract and compare the two numbers from $a and $b directly, as it only has to extract the number from each value once, rather than once for every pair-wise comparison. 这比使用代码直接提取和比较$a$b的两个数字的常规sort调用更有效,因为它只需要从每个值中提取一次数,而不是每次成对比较一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM