简体   繁体   中英

Perl read and write text file with strings

Friends need help. Following my INPUT TEXT FILE

Andrew   UK
Cindy    China
Rupa     India
Gordon   Australia
Peter    New Zealand

To convert the above into hash and to write back into file when the records exist in a directory. I have tried following (it does not work).

#!/usr/perl/5.14.1/bin/perl

use strict;
use warnings;
use Data::Dumper;

my %hash = ();
my $file = ".../input_and_output.txt";
my $people;
my $country;

open (my $fh, "<", $file) or die "Can't open the file $file: ";
my $line;
while (my $line =<$fh>) {
  my ($people) = split("", $line);
  $hash{$people} = 1;
}


foreach my $people (sort keys %hash) {

  my @country = $people;
  foreach my $c (@country) {
    my $c_folder = `country/test1_testdata/17.26.6/$c/`;   

    if (-d $cad_root){
      print "Exit\n";
    } else {
      print "NA\n";
    }
 }

This is the primary problem:

my ($people) = split("", $line);

Your are splitting using an empty string, and you are assigning the return value to a single variable (which will just end up with the first character of each line).

Instead, you should split on ' ' (a single space character which is a special pattern ):

As another special case, ... when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\\x20" , but not eg / / ). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator.

Limit the number of fields returned to ensure the integrity of country names with spaces:

#!/usr/bin/env perl

use strict;
use warnings;

my @people;

while (my $line = <DATA>) {
    $line =~ /\S/ or next;
    $line =~ s/\s+\z//;
    push @people, [ split ' ', $line, 2 ];
}

use YAML::XS;
print Dump \@people;

__DATA__
Andrew   UK
Cindy    China
Rupa     India
Gordon   Australia
Peter    New Zealand

The entries are added to an array so 1) The input order is preserved; and 2) Two people with the same name but from different countries do not result in one entry being lost.

If the order is not important, you could just use a hash keyed on country names with people's names in an array reference for each entry. For now, I am going to assume order matters (it would help us help you if you put more effort into formulate a clear question).

One option is to now go through the list of person-country pairs, and print all those pairs for which the directory country/test1_testdata/17.26.6/$c/ exists (incidentally, in your code you have

my $c_folder = `country/test1_testdata/17.26.6/$c/`;

That will try to execute a program called country/test1_testdata/17.26.6/$c/ and save its output in $c_folder if it produces any. To moral of the story: In programming, precision matters. Just because ` looks like ' , that doesn't mean you can use one to mean the other.)

Given that your question is focused on hashes, I use an array of references to anonymous hashes to store the list of people-country pairs in the code below. I cache the result of the lookup to reduce the number of times you need to hit the disk.

#!/usr/bin/env perl

use strict;
use warnings;

@ARGV == 2 ? run( @ARGV )
      : die_usage()
;

sub run {
    my $people_data_file = shift;
    my $country_files_location = shift;

    open my $in, '<', $people_data_file
        or die "Failed to open '$people_data_file': $!";

    my @people;
    my %countries;

    while (my $line = <$in>) {
        next unless $line =~ /\S/; # ignore lines consisting of blanks
        $line =~ s/\s+\z//;# remove all trailing whitespace
        my ($name, $country) = split ' ', $line, 2;
        push @people, { name => $name, country => $country };
        $countries{ $country } = undef;
    }

    # At this point, @people has a list of person-country pairs
    # We are going to use %countries to reduce the number of
    # times we need to check the existence of a given directory,
    # assuming that the directory tree is stable while this program
    # is running.

    PEOPLE:
    for my $person ( @people ) {
        my $country = $person->{country};
        if ($countries{ $country }) {
            print join("\t", $person->{name}, $country), "\n";
        }
        elsif (-d "$country_files_location/$country/") {
            $countries{ $country } = 1;
            redo PEOPLE;
        }
    }
}

sub die_usage {
    die "Need data file name and country files location\n";
}

Now, there are a bazillion variations on this which is why it is important for you to formulate a clear and concise question so people trying to help you can answer your specific questions, instead of each coming up his/her own solution to the problem as they see it. For example, one could also do this:

#!/usr/bin/env perl

use strict;
use warnings;

@ARGV == 2 ? run( @ARGV )
      : die_usage()
;

sub run {
    my $people_data_file = shift;
    my $country_files_location = shift;

    open my $in, '<', $people_data_file
        or die "Failed to open '$people_data_file': $!";

    my %countries;

    while (my $line = <$in>) {
        next unless $line =~ /\S/; # ignore lines consisting of blanks
        $line =~ s/\s+\z//;# remove all trailing whitespace
        my ($name, $country) = split ' ', $line, 2;
        push @{ $countries{$country} }, $name;
    }

    for my $country (keys %countries) {
        -d "$country_files_location/$country"
            or delete $countries{ $country };
    }

    # At this point, %countries maps each country for which
    # we have a data file to a list of people. We can then
    # print those quite simply so long as we don't care about
    # replicating the original order of lines from the original
    # data file. People's names will still be sorted in order
    # of appearance in the original data file for each country.

    while (my ($country, $people) = each %countries) {
        for my $person ( @$people) {
            print join("\t", $person, $country), "\n";
        }
    }
}


sub die_usage {
    die "Need data file name and country files location\n";
}

If what you want is a counter of names in a hash, then I got you, buddy!

I won't attempt the rest of the code because you are checking a folder of records that I don't have access to so I can't trouble shoot anything more than this.

I see one of your problems. Look at this:

#!/usr/bin/env perl
use strict;
use warnings;  
use feature 'say'; # Really like using say instead of print because no need for newline. 
        
my $file = 'input_file.txt';
my $fh; # A filehandle. 
        
my %hash;
my $people;
my $country;
my $line;  
unless(open($fh, '<', $file)){die "Could not open file $_ because $!"}
        
while($line = <$fh>)
{
($people, $country) = split(/\s{2,}/, $line); # splitting on at least two spaces
        
say "$people \t $country"; # Just printing out the columns in the file or people and Country.
                
$hash{$people}++; # Just counting all the people in the hash.
                    # Seeing how many unique names there are, like is there more than one Cindy, etc ...?     
}
            
say "\nNow I'm just sorting the hash of people by names.";

foreach(sort{$a cmp $b} keys %hash)
{
say "$_ => $hash{$_}"; # Based on your file. The counter is at 1 because nobody has the same names. 
}

Here is the output. As you can see I fixed the problem by splitting on at least two white-spaces so the country names don't get cut out.

Andrew   UK

Cindy    China

Rupa     India

Gordon   Australia

Peter    New Zealand

Andrew   United States

Now I'm just sorting the hash of people by names.
Andrew => 2
Cindy => 1
Gordon => 1
Peter => 1
Rupa => 1

I added another Andrew to the file. This Andrew is from the United States as you can see. I see one of your problems. Look at this:

my ($people) = split("", $line);

You are splitting on characters as there is no space between those quotes. If you look at this change now, you are splitting on at least one space.

 my ($people) = split(" ", $line);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM