简体   繁体   中英

How to find nodes case-insensitive using XML::LibXML

I need to find nodes in an XML file which needs to be case-insensitive. The following code works but only if none of the element is in lower case:

my $dom = XML::LibXML->new->parse_fh(*DATA);
my $xpc = XML::LibXML->XPathContext->new( $dom->documentElement );
my @invoices = $xpc->findnodes( "/ALLINVOICES/INVOICES/INVOICE" );

__DATA__
<ALLINVOICES>
  <INVOICES>
    <INVOICE number="12345">
       <CUSTOMER>Mr Fubar</CUSTOMER>
    </INVOICE>
  </INVOICES>
</ALLINVOICES>

How do I fix it so that it also accepts <allinvoices><invoices><invoice> ?

A string preprocessing stage to normalize element names into lowercase might help you:

my $xmlstring = '';
{
    local $/;
    $xmlstring = <DATA>;
}

#
# Turns all element names into lowercase.
# Works as well with uppercase ( replace lc with uc )
#
# !!! The usual caveats wrt processing semistructured data with regexen apply (ie. don't try more complex transformations purely by changing the regex pattern )
#
$xmlstring =~ s#(<[/]?[^/>[:space:]]+)#lc($1)#eg; # all element names

my $dom = XML::LibXML->new->parse_string( $xmlstring);
# ...

Note

The presented solution handles comments and cdata sections incorrectly (as pointed out by @ikegami). In order be safe according to the specs , the first character of an element name must belong to the following character class:

  [:_a-zA-Z\x{c0}-\x{d6}\x{d8}-\x{f6}\x{f8}-\x{ff}\x{0370}-\x{037d}\x{037f}-\x{1fff}\x{200c}\x{200d}\x{2070}-\x{218f}\x{2c00}-\x{2fef}\x{3001}-\x{d7ff}\x{f900}-\x{fdcf}\x{fdf0}-\x{fffd}\N{U+10000}-\n{U+EFFFF}]

This monster would be inserted between [/]? and [^/>[:space:]]* (observe the changed repetition modifier) in the regex pattern of the code section above.

XML and XPath are always case-sensitive, so you would need to write code that converts the strings to upper or lower case to compare them. I think LibXML::XPathContext allows you to register additional functions so you could write a function in Perl that you call from XPath with the node and the name you want to compare and return true or false as needed:

$xpc->registerFunction('tn', sub { my ($node,$name) = @_; if (lc($node->item(0)->localName) eq $name) { return XML::LibXML::Boolean->True; } else { return XML::LibXML::Boolean->False;} });

my @invoices = $xpath->findnodes('/*[tn(., "allinvoices")]/*[tn(., "invoices")]/*[tn(., "invoice")]');

That is however only slightly shorter than using translate in XPath, as already suggested in a comment, when writing (lots of) long XPath expressions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM