简体   繁体   English

如何用Perl读取Excel文件?

[英]how to read an excel file with Perl?

The Spreadsheet::ParseExcel does the work fine, however I need a method to read a file without it, lets say wih "out of the box Perl" as I'm unable to install any PM or CPAN module. Spreadsheet :: ParseExcel可以很好地工作,但是我需要一种方法来读取没有它的文件,因为我无法安装任何PM或CPAN模块,所以请说“开箱即用的Perl”。 Does anyone has a suggestion to get me started? 有没有人建议我入门?

What is a relatively easy task using CPAN modules is actually very difficult without them. 没有它们,使用CPAN模块相对困难的任务实际上非常困难。

For a start the Excel binary data (BIFF) is stored in another binary file format called an OLE compound document. 首先,Excel二进制数据(BIFF)以另一种称为OLE复合文档的二进制文件格式存储。 This is like a file system within a file and the BIFF data might not be stored sequentially. 这就像文件中的文件系统,并且BIFF数据可能不会顺序存储。 So to start you would have to write a parser to get the data out. 因此,开始时,您必须编写一个解析器以获取数据。

Once the raw BIFF data is extracted you have to parse it to find cell data. 提取原始BIFF数据后,您必须对其进行解析以查找单元格数据。 That is a little easier but still contains difficulties such as the strings being stored in a hash table away from the cell data. 这稍微容易些,但仍然存在一些困难,例如字符串存储在哈希表中,而不是单元格数据。 And dates that are indistinguishable from plain numbers. 和普通数字没有区别的日期。 And data in merged cells. 以及合并单元格中的数据。 And everything is still in binary and bitmasks control the meaning of data structures. 而且所有内容仍为二进制,并且位掩码控制数据结构的含义。

Fortunately all these headaches have been suffered by someone else* and wrapped up in a module so no-one else has to endure them. 幸运的是,所有这些头痛都已被其他人*痛苦,并被包裹在一个模块中,因此没有其他人必须忍受它们。

So, even if your admins won't install modules for you there are lots of ways to install modules or even install perl locally so that you don't have to bother them. 因此,即使您的管理员不会为您安装模块,也有很多方法可以在本地安装模块或什至在本地安装perl ,因此您不必打扰它们。 In the end that will probably be an easier solution. 最后,这可能是一个更简单的解决方案。

* Me partially. *我部分。

OpenDocument is an ISO standard so you could read the specification and write your own parser for it. OpenDocument是一个ISO标准,因此您可以阅读该规范并为其编写自己的解析器。

CPAN modules exist because there are things that lots of things (some simple, some complex) that people want to do that are inappropriate to be part of the core language. 之所以存在CPAN模块,是因为有些事情人们希望做的事情(有些简单,有些复杂)不适合成为核心语言的一部分。 Parsing Excel spreadsheets is one of these (one of the more complex ones). 解析Excel电子表格就是其中之一(较复杂的电子表格之一)。

You should fix whatever barrier is preventing you from installing a module to help. 您应该解决阻碍安装模块帮助的任何障碍。 It may be managerial (in which case you need to lobby to get the policy changed), it may be technical (in which case you may just need to learn about local::lib . 它可能是管理性的(在这种情况下,您需要游说以更改策略),可能是技术性的(在这种情况下,您可能只需要了解local::lib

将电子表格导出到csv文件,并在有或没有Text::CSV情况下进行解析。

I'll build on the answer above from @mob regarding Text::CSV. 我将基于@mob的有关Text :: CSV的答案。 A while back I found Text:CSV::Slurp on CPAN and was an instant convert. 不久前 ,我在CPAN上发现了Text:CSV :: Slurp ,并立即进行了转换。 It takes a CSV file with header rows and returns an arrayref of hashrefs where the keys are the names from the header rows. 它使用带有标题行的CSV文件,并返回hashrefs的arrayref,其中键是标题行中的名称。 Obviously this won't work in all cases, but if it does your code is simple: 显然,这并非在所有情况下都有效,但是如果这样做,您的代码很简单:

my $slurp = Text::CSV::Slurp->new;
my $data = $slurp->load(file => $filename);
for my $record (@$data) {
    ...
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM