简体   繁体   中英

Perl - count words of a file

i want to count words in a file and want result the number of same word

my script

#!/usr/bin/perl

#use strict;
#use warnings;

use POSIX qw(strftime);
$datestring = strftime "%Y-%m-%d", localtime;

print $datestring;

my @files = <'/mnt/SESSIONS$datestring*'>;
my $latest;

foreach my $file (@files) {
  $latest = $file if $file gt $latest;
}

@temp_arr=split('/',$latest);

open(FILE,"<$latest");
print "file loaded \n";
my @lines=<FILE>;
close(FILE);

#my @temp_line;

foreach my $line(@lines) {

    @line=split(' ',$line);
    #push(@temp_arr);

    $line =~ s/\bNT AUTHORITY\\SYSTEM\b/NT__AUTHORITY\\SYSTEM/ig;   

    print $line;

    #print "$line[0] $line[1] $line[2] $line[3] $line[4] $line[5] \n";

}

My log file

SID        USER                      TERMINAL        PROGRAM
---------- ------------------------- --------------- -------------------------
         1 SYSTEM                    titi            toto (fifi)
         2 SYSTEM                    titi            toto (fofo)
         4 SYSTEM                    titi            toto (bobo)
         5 NT_AUTHORITY\SYSTEM       titi            roro
         6 NT_AUTHORITY\SYSTEM       titi            gaga
         7 SYSTEM                    titi            gogo (fifi)

         5 rows selected.

I want result :

User = 3 SYSTEM with program toto
, User = 1 SYSTEM with program gogo

Thanks for any information

I see yours as a two-step problem -- you want to parse the log files, but then you also want to store elements of that data into a data structure that you can use to count.

This is a guess, based on your sample data, but if your data is fixed-width, one way you can parse that into the fields is to use unpack . I think substr might more efficient, so consider how many files you need to parse and how long each is.

I would store the data into a hash and then dereference it after the files have all been read.

my %counts;

open my $IN, '<', 'logfile.txt' or die;
while (<$IN>) {
  next if length ($_) < 51;
  my ($sid, $user, $terminal, $program) = unpack 'A9 @11 A25 @37 A15 @53 A25', $_;

  next if $sid eq '---------';  # you need some way to filter out bogus or header rows

  $program =~ s/\(.+//;         # based on your example, turn toto (fifi) into toto

  $counts{$user}{$program}++;
}
close $IN;

while (my ($user, $ref) = each %counts) {
  while (my ($program, $count) = each %$ref) {
    print "User = $count $user with program $program\n";
  }
}

Output from program:

User = 3 SYSTEM with program toto
User = 1 SYSTEM with program gogo
User = 1 NT_AUTHORITY\SYSTEM with program roro
User = 1 NT_AUTHORITY\SYSTEM with program gaga

This code detect automatically the size of input fields (your snippet seems an output from Oracle query) and print the results:

#!/usr/bin/perl

use strict;
use warnings;
use v5.10;

open my $file, '<', 'input.log' or die "$?";

my $data = {};
my @cols_size = ();

while (<$file>) {

    my $line = $_;

    if ( $line =~ /--/) {
        foreach (split(/\s/, $line)) {
            push(@cols_size, length($_) +1);
        }
        next;
    }

    next unless (@cols_size);
    next     if ($line =~ /rows selected/);

    my ($sid, $user, $terminal, $program) = unpack('A' . join('A', @cols_size), $line);
    next unless ($sid);

    $program =~ s/\(\w+\)//;

    $data->{$user}->{$program}++;

}

close $file;

foreach my $user (keys %{$data}) {
    foreach my $program (keys %{$data->{$user}}) {
        say sprintf("User = %s %s with program %s", $data->{$user}->{$program}, $user, $program);
    }
}

我不了解$ counts {$ user} {$ program} ++++;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM