简体   繁体   中英

How to include a data file with a Perl module?

What is the "proper" way to bundle a required-at-runtime data file with a Perl module, such that the module can read its contents before being used?

A simple example would be this Dictionary module, which needs to read a list of (word,definition) pairs at startup.

package Reference::Dictionary;

# TODO: This is the Dictionary, which needs to be populated from
#  data-file BEFORE calling Lookup!
our %Dictionary;

sub new {
  my $class = shift;
  return bless {}, $class;
}

sub Lookup {
  my ($self,$word) = @_;
  return $Dictionary{$word};
}
1;

and a driver program, Main.pl:

use Reference::Dictionary;

my $dictionary = new Reference::Dictionary;
print $dictionary->Lookup("aardvark");

Now, my directory structure looks like this:

root/
  Main.pl
  Reference/
    Dictionary.pm
    Dictionary.txt

I can't seem to get Dictionary.pm to load Dictionary.txt at startup. I've tried a few methods to get this to work, such as...

  • Using BEGIN block:

     BEGIN { open(FP, '<', 'Dictionary.txt') or die "Can't open: $!\\n"; while (<FP>) { chomp; my ($word, $def) = split(/,/); $Dictionary{$word} = $def; } close(FP); } 

    No dice: Perl is looking in cwd for Dictionary.txt, which is the path of the main script ("Main.pl"), not the path of the module, so this gives File Not Found.

  • Using DATA:

     BEGIN { while (<DATA>) { chomp; my ($word, $def) = split(/,/); $Dictionary{$word} = $def; } close(DATA); } 

    and at end of module

     __DATA__ aardvark,an animal which is definitely not an anteater abacus,an oldschool calculator ... 

    This too fails because BEGIN executes at compile-time, before DATA is available.

  • Hard-code the data in the module

     our %Dictionary = ( aardvark => 'an animal which is definitely not an anteater', abacus => 'an oldschool calculator' ... ); 

    Works, but is decidedly non-maintainable.

Similar question here: How should I distribute data files with Perl modules? but that one deals with modules installed by CPAN, not modules relative to the current script as I'm attempting to do.

There's no need to load the dictionary at BEGIN time. BEGIN time is relative to the file being loaded. When your main.pl says use Dictionary , all the code in Dictionary.pm is compiled and loaded. Put the code to load it early in Dictionary.pm.

package Dictionary;

use strict;
use warnings;

my %Dictionary;  # There is no need for a global
while (<DATA>) {
    chomp;
    my ($word, $def) = split(/,/);
    $Dictionary{$word} = $def;
}

You can also load from Dictionary.txt located in the same directory. The trick is you have to provide an absolute path to the file. You can get this from __FILE__ which is the path to the current file (ie. Dictionary.pm ).

use File::Basename;

# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);

open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";

my %Dictionary;
while (<$fh>) {
    chomp;
    my ($word, $def) = split(/,/);
    $Dictionary{$word} = $def;
}
close($fh);

Which should you use? DATA is easier to distribute. A separate, parallel file is easier for non-coders to work on.


Better than loading the whole dictionary when the library is loaded, it is more polite to wait to load it when it's needed.

use File::Basename;

# Load the dictionary from Dictionary.txt
sub _load_dictionary {
    my %dictionary;

    # Get the directory Dictionary.pm is located in.
    my $dir = dirname(__FILE__);

    open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";

    while (<$fh>) {
        chomp;
        my ($word, $def) = split(/,/);
        $dictionary{$word} = $def;
    }

    return \%dictionary;
}

# Get the possibly cached dictionary
my $Dictionary;
sub _get_dictionary {
    return $Dictionary ||= _load_dictionary;
}

sub new {
    my $class = shift;

    my $self = bless {}, $class;
    $self->{dictionary} = $self->_get_dictionary;

    return $self;
}

sub lookup {
    my $self = shift;
    my $word = shift;

    return $self->{dictionary}{$word};
}

Each object now contains a reference to the shared dictionary (eliminating the need for a global) which is only loaded when an object is created.

I suggest using DATA with INIT instead of BEGIN to ensure that the data is initialised before run time. It also makers it more self-documenting

Or it may be more appropriate to use a UNITCHECK block, which will be executed as early as possible, immediately after the library file is compiled, and so can be considered as an extension of the compilation

package Dictionary;

use strict;
use warnings;

my %dictionary;
UNITCHECK {
    while ( <DATA> ) {
        chomp;
        my ($k, $v) = split /,/;
        $dictionary{$k} = $v;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM