简体   繁体   中英

Sorting lines by those containing numbers, ignoring numbers attached to a letter

Sorting lines by those containing numbers, ignoring numbers attached to a letter

I need to sort the lines in a file, such that lines containing at least one number (0-9), not counting the numbers 1-5 when preceded by one of these letters ("a", "e", "g", "i", "n", "o", "r", "u", "v", or "u:" (u + :)), is moved to the end of the file.

Here is a sample file:

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

In the sample file, here are notes on which ones match:

I want to buy some food. % does not match
I want 3 chickens. % matches
I have no3 basket for the eggs. % does not match, because "3" is preceded by "o"
I have no3 basket which can hold 24 eggs. % matches, because contains "24"
Move the king to A3. % matches, words preceded by "A" are not ignored.
Can you move the king to a6? % matches, 6 is not 1-5

The output would place all matching lines at the bottom:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

Preferably (although not necessary), the solution sorts lines containing the greatest number of matching digits to the end. Eg "I have 10 chickens and 12 bats." (4 digits) appears after "I have 99 chickens." (2 digits).

Solutions using BASH, Perl, Python 2.7, Ruby, sed , awk , or grep are fine.

If your grep support -P (perl-regexp) option:

pat='(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])'

{ grep -vP "$pat" input.txt; grep -P "$pat" input.txt; } >output.txt

If you have ssed (super sed) installed:

ssed -nR '
/(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])/{
    H
    $!d
}
$!p
${
    g
    s/\n//
    p
}' input.txt

When this program is run on your dataset:

#!/usr/bin/env perl    
use strict;
use warnings;

my @moved = ();

my $pat = qr{
      [67890]                   # these big digits anywhere, or else...
    | (?<! [aeginoruv]   )      # none of those letters before
      (?<! u:            )      # nor a "u:" before
      [12345]                   # these little digits
}x;

while (<>) {
    if (/$pat/) {
        push @moved, $_;
    } else {
        print;
    }
}

print @moved;

It produces your desired output:

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

EDIT

To incorporate the sorting, change the final print to this:

print for sort {
    $a =~ y/0-9// <=> $b =~ y/0-9//
} @moved;

And now the output will be this:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

This sounds like a job for perl!

Seriously, sed will struggle with the requirement to move "u:" to the end of the file. sed is really line based. Awk could do it, but perl is probably better.

Use \\d+ to match a line with digits

Then use [aeginorv]\\d+ to filter out your letters

u:\\d+ to handle your special case u: stuff (you're going to have to buffer this up [eg just store matching lines in an array] so you can output it at the end)

[Edited because everyone else had a code which accepted a file argument:]

For a non-regex solution in Python, how about

import sys

def keyfunc(s):
    ignores = ("a", "e", "g", "i", "n", "o", "r", "u", "v", "u:")
    return sum(c.isdigit() and not (1 <= int(c) <= 5 and s[:i].endswith(ignores)) 
               for i,c in enumerate(s))

with open(sys.argv[1]) as infile:
    for line in sorted(infile, key=keyfunc):
        print line,

which produces:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
I have 99 chickens.
I have 10 chickens and 12 bats.
use strict;
use v5.10.1;
my @matches;
my @no_matches;
while (my $line = <DATA>) {
    chomp $line;

    if ($line =~ / \d+\W/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    elsif ($line =~ /u:[1-5]+\b/) {
        #say "NOMATCH   $line"; 
        push @no_matches, $line;
    }
    elsif ($line =~ /[^aeginoruv][1-5]+\b/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    elsif ($line =~ /.[6-90]/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    else {
        #say "NOMATCH   $line";
        push @no_matches, $line;
    }
}

foreach (@no_matches){
    say $_;
}
foreach (@matches){
    say $_;
}

__DATA__
I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
What is u:34?                              <- custom test 
Move the king to A3.
Can you move the king to a6?

PROMPT> perl regex.pl

I want to buy some food.
I have no3 basket for the eggs.
What is u:34?                              <- custom test
I want 3 chickens.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

Ruby

( Edit : now includes optional sort)

matches = []
non_matches = []
File.open("lines.txt").each do |line|
  if line.match(/[67890]|(?<![aeginoruv])(?<!u:)[12345]/)
    matches.push line
  else
    non_matches.push line
  end
end
puts non_matches + matches.sort_by{|m| m.scan(/\d/).length}

produces:

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

This might work for you:

sed 'h;s/[aeginoruv][1-5]\|u:[1-5]//g;s/[^0-9]//g;s/^$/0/;G;s/\n/\t/' file |
sort -sn |
sed 's/^[^\t]*\t//'
I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

Basically a three step move:

  1. Make a numeric key by which to sort the output. Lines that don't need sorting are given a key of 0, all others their numeric value.
  2. Sort by the numeric key keeping order -s
  3. Remove the numeric key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM