简体   繁体   中英

Safest way to extract all variable refs from xpath expression in java

I'm using java and saxon processor.

Let say I have some XPath expression with possible variable refs in it. I also have some custom xpath functions which can be nested to arbitrary depth which can have variable refs as parameters. So xpath expression is pretty complex.

I'd like to extract prefix and localname for every variable ref in xpath expression without evaluating it.

I can extract it by setting some custom XPathVariableResolver to my xpath and by evaluating it. But this can make pretty overhead as I only want variable refs to extract, not to call custom functions which can be time consuming.

Is it safe to do it just by some pattern matching with '$' sign? Probably some API call for this would be great. Or if no API call is available, then which cases I should avoid with '$' sign (probably it can be located as string and I need to avoid taking that one).

In case every variable is declared (which must be so in a single stylesheet module, you can simply use this XPath 2.0 expression :

doc(yourUri)//xsl:variable/@name/string()

where the namespace prefix "xsl" must be registered to the namespace "http://www.w3.org/1999/XSL/Transform" .

Or from an XSLT stylesheet :

document(yourUri)//xsl:variable/@name/string()

You probably want also to get all parameter names :

doc(yourUri)//xsl:param/@name/string()

Or, both variable and parameter names :

doc(yourUri)//*[self::xsl:variable or self::xsl:param]/@name/string()

Now, this doesn't get you the set of variables defined within XPath expressions . To do this you'd need an XPath 2.0 parser (and lexer). In the past I developed such (using the FXSL parsing framework ) but haven't published this parser. If you are interested, let me know and I'll send it to you.

Alternatively , for a predefined subset of XSLT atribute names you can analyze their values and retrieve a dollar possibly followed by whitespace, then followed by a name. And all this must not be within single or double quotes. Such a regular expression isn't too difficult to write.

As a last step, you'd have to dedup the so obtained variable references -- for example using xsl:for-each-group


Update :

Here is a fragment of the XPath 2.0 grammar I am using :

VariableReference   : '$'     QName

QName         : QNAME2 

                  |  OR
                  |  AND
                  |  EQ
                  |  NE
                  |  LT
                  |  LE
                  |  GT
                  |  GE
                  |  IS
                  |  TO
                  |  DIV
                  |  IDIV
                  |  MOD
                  |  UNION
                  |  INTERSECT
                  |  EXCEPT
                  |  THEN
                  |  ELSE
                  |  IN
                  |  RETURN
                  |  SATISFIES

And the terminal symbol QNAME2 is defined in the lexer in this way :

([\i-[:]][\c-[:]]*:)?[\i-[:]][\c-[:]]*

Of course, even before this one must be sure (recognize) that this is not part of a string literal, which in my lexer I define as :

     ("([^"])*")+
    |
     ('([^'])*')+

Also, you should skip everything that is within comments. I have this Regex for comment start and comment end :

  (\(:)         <!-- Comment start --> 

 |
  (:\))         <!-- Comment end --> 

Use the s9api XPathCompiler class to compile the expression:

XPathCompiler c = new Processor(false).newXPathCompiler();
c.setAllowUndeclaredVariables(true);
XPathExecutable exp = c.compile(xpathExpression);

The external variables in the expression are then available by calling:

exp.iterateExternalVariables();

By the way, it's hit and miss whether Saxon questions get noticed here. If you want to be sure of an answer, use the Saxon forum at http://saxonica.plan.io/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM