I need to find nodes in an XML file which needs to be case-insensitive. The following code works but only if none of the element is in lower case:
my $dom = XML::LibXML->new->parse_fh(*DATA);
my $xpc = XML::LibXML->XPathContext->new( $dom->documentElement );
my @invoices = $xpc->findnodes( "/ALLINVOICES/INVOICES/INVOICE" );
__DATA__
<ALLINVOICES>
<INVOICES>
<INVOICE number="12345">
<CUSTOMER>Mr Fubar</CUSTOMER>
</INVOICE>
</INVOICES>
</ALLINVOICES>
How do I fix it so that it also accepts <allinvoices><invoices><invoice>
?
A string preprocessing stage to normalize element names into lowercase might help you:
my $xmlstring = '';
{
local $/;
$xmlstring = <DATA>;
}
#
# Turns all element names into lowercase.
# Works as well with uppercase ( replace lc with uc )
#
# !!! The usual caveats wrt processing semistructured data with regexen apply (ie. don't try more complex transformations purely by changing the regex pattern )
#
$xmlstring =~ s#(<[/]?[^/>[:space:]]+)#lc($1)#eg; # all element names
my $dom = XML::LibXML->new->parse_string( $xmlstring);
# ...
Note
The presented solution handles comments and cdata sections incorrectly (as pointed out by @ikegami). In order be safe according to the specs , the first character of an element name must belong to the following character class:
[:_a-zA-Z\x{c0}-\x{d6}\x{d8}-\x{f6}\x{f8}-\x{ff}\x{0370}-\x{037d}\x{037f}-\x{1fff}\x{200c}\x{200d}\x{2070}-\x{218f}\x{2c00}-\x{2fef}\x{3001}-\x{d7ff}\x{f900}-\x{fdcf}\x{fdf0}-\x{fffd}\N{U+10000}-\n{U+EFFFF}]
This monster would be inserted between [/]?
and [^/>[:space:]]*
(observe the changed repetition modifier) in the regex pattern of the code section above.
XML and XPath are always case-sensitive, so you would need to write code that converts the strings to upper or lower case to compare them. I think LibXML::XPathContext
allows you to register additional functions so you could write a function in Perl that you call from XPath with the node and the name you want to compare and return true or false as needed:
$xpc->registerFunction('tn', sub { my ($node,$name) = @_; if (lc($node->item(0)->localName) eq $name) { return XML::LibXML::Boolean->True; } else { return XML::LibXML::Boolean->False;} });
my @invoices = $xpath->findnodes('/*[tn(., "allinvoices")]/*[tn(., "invoices")]/*[tn(., "invoice")]');
That is however only slightly shorter than using translate
in XPath, as already suggested in a comment, when writing (lots of) long XPath expressions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.