简体   繁体   English

Perl使用STDIN死于大型XML文件

[英]Perl dies on big XML file, using STDIN

I get this error when I run a perl script: 运行perl脚本时出现此错误:

unclosed token at line 1, column 0, byte 0 at /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level/XML/Parser.pm line 187.

at mysscript.pl line 8.

Heres line 8 at mysscript.pl mysscript.pl第8 mysscript.pl

$twig->parse( \*STDIN);

I tried some other variations like: 我尝试了其他一些变化,例如:

$twig->parse(\*STDIN);
$twig->parse(*STDIN);

But they didn't work, I know that it also says that there is something wrong with my perl system files but i doubt that, i found someone that had the same problem and he had to fix his code instead. 但是它们没有用,我知道它还说我的perl系统文件有问题,但是我怀疑,我发现有人遇到相同的问题,他不得不改正他的代码。

That's an XML error, not a Perl error. 那是一个XML错误,而不是Perl错误。 It does suggest you've got broken XML. 它确实表明您的XML损坏了。 You can trap it by eval in the parse. 您可以通过eval在解析中捕获它。

But actually - thinking about it - the problem is probably that you're only reading the first line of STDIN with your parse . 但是实际上,考虑到这个问题-问题可能在于您只是在parse读取STDIN的第一行。 Try adding: 尝试添加:

{ 
    local $/;
    $twig -> parse ( <STDIN> );
}

However, for large XML files, I quite like XML::Twig , because it has a purge method, which lets you throw away XML you've already processed. 但是,对于大型XML文件,我非常喜欢XML::Twig ,因为它具有purge方法,该方法可以丢弃已处理的XML。 One of the downsides of XML is that it's memory footprint is approx 10x the raw file size. XML的缺点之一是它的内存占用量约为原始文件大小的10倍。 So it's possible you're running out of memory if your file is particularly huge. 因此,如果文件特别大,则可能内存不足。

$twig->parse( \\*STDIN) is the proper syntax, so that's not what causing the error. $twig->parse( \\*STDIN)是正确的语法,因此不是导致错误的原因。

So it looks like either there is a problem with your XML or there is a bug somewhere. 因此,看起来您的XML出现问题或某处存在错误。 Did you try checking your XML (with xmlwf or xmllint or a similar tool)? 您是否尝试过检查XML(使用xmlwfxmllint或类似工具)? If it parses, then what is the encoding of the XML? 如果解析,那么XML的编码是什么? If it's UTF-16 then that might be the problem, libexpat (on which XML::Twig is based) seems to have trouble with this encoding. 如果是UTF-16,则可能是问题所在, libexpat (基于XML :: Twig的基础)似乎在编码方面遇到了麻烦。

The XML::Twig module has only two basic ways of reading the XML to be parsed: XML::Twig模块只有两种读取要解析的XML的基本方法:

  • parse , which expects a string containing the XML data as a parameter parse ,期望包含XML数据的字符串作为参数

  • parsefile , which expects a string that specifies the name (and path) of an XML file to be read parsefile ,它期望一个字符串 ,该字符串指定要读取的XML文件的名称(和路径)

The is no option to pass an open file handle, and if you write $twig->parse(\\*STDIN) or $twig->parse(*STDIN) then you will be passing (something like) the strings GLOB(0x44b574) and *main::STDIN respectively, which is clearly not valid XML. 传递打开的文件句柄是没有选择的,如果您编写$twig->parse(\\*STDIN)$twig->parse(*STDIN) ,则将传递(类似)字符串GLOB(0x44b574)*main::STDIN ,这显然是无效的XML。

I presume you can work out from there what your call should look like. 我想您可以从那里算出通话的样子。 If you are passing the file name as a parameter on the command line then the simplest solution is to write 如果您在命令行中将文件名作为参数传递,那么最简单的解决方案是编写

$twig->parsefile(shift)

but without more information I can't help you any further. 但是如果没有更多信息,我将无济于事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM