简体   繁体   中英

Need to execute sql queries over xml data

Currently we have several queries, data of which we fetch from db and store it in memory. Now for the same queries we need a way to fetch data from xml rather than db server, ie the data would be in xml in the below form:

    <?xml version='1.0'  encoding='Cp1252' ?>
    <tables>
    <TableOne>
    <Row1>
    <Column1>TableOne Row1 Column1 Value</Column1>
    <Column2>TableOne Row1 Column2 Value</Column2>
    <Column3>TableOne Row1 Column3 Value</Column3>
    </Row1>
    <Row2>
    <Column1>TableOne Row2 Column1 Value</Column1>
    <Column2>TableOne Row2 Column2 Value</Column2>
    <Column3>TableOne Row2 Column3 Value</Column3>
    </Row2>
    </TableOne>
    <TableTwo>
    <TableTwoRow1>
    <TableTwoRow1Column1>TableOne Row1 Column1 Value</TableTwoRow1Column1>
    <TableTwoRow1Column2>TableTwoRow1 Column2 Value</TableTwoRow1Column2>
    <TableTwoRow1Column3>TableTwoRow1 Column3 Value</TableTwoRow1Column3>
    </TableTwoRow1>
    <TableTwoRow2>
    <TableTwoRow2Column1>TableTwoRow2 Column1 Value</TableTwoRow2Column1>
    <TableTwoRow2Column2>TableTwoRow2 Column2 Value</TableTwoRow2Column2>
    <TableTwoRow2Column3>TableTwoRow2 Column3 Value</TableTwoRow2Column3>
    </TableTwoRow2>
    </TableTwo>
    </tables>

To achieve the above functionality I have made my own data structure to store the above xml:

    public class DataObject {

    Map<String, Table> tables = new HashMap<String, Table>();

    public Map<String, Table> getTables() {
    return tables;
}

public void setTable(Map<String, Table> table) {
    this.tables = table;
}

    }

and then the Table would look like below:

    public class Table {

private String tableName;
private List<Map<String,String>> rowList = new ArrayList<Map<String,String>>();

public Table(String tableName) {
    super();
    this.tableName = tableName;
}

public List<Map<String, String>> getRowList() {
    return rowList;
}

public void setRowList(List<Map<String, String>> rowList) {
    this.rowList = rowList;
}

public String getTableName() {
    return tableName;
}

public void setTableName(String tableName) {
    this.tableName = tableName;
}   

}

Each row would be represented as Map and multiple rows would be represented as List. I have been able to put across the data from xml to the above 'DataObject' as below:

private DataObject getDataObject(Document document) {

    DataObject dataObject = new DataObject();

    Element rootElement = document.getRootElement();
    List<Element> tableList = rootElement.getChildren();

    for (Element tableElement : tableList) {

        String tableName = tableElement.getName();
        Table table = new Table(tableName);
        dataObject.getTables().put(tableName, table);
        List<Element> rows = tableElement.getChildren();

        for (Element rowElement : rows) {

            Map<String, String> row = new HashMap<String, String>();

            List<Element> columns = rowElement.getChildren();
            for (Element columnElement : columns) {
                String columnName = columnElement.getName();
                String columnValue = columnElement.getValue();
                row.put(columnName, columnValue);
            }

            table.getRowList().add(row);
        }
    }

    return dataObject;

}

Now I can't think of an efficient way to query the above dataObject to execute the queries and I can't find any inbuilt API that would help me do that. The queries would have select, group by, order by and all logical operators. Any thoughts would really be appreciated.

Why SQL? Consider the other special-purpose declarative language designed specifically for XML: XSLT . And Java ships with an XSLT 1.0 processor and still external processors for XSLT 2.0 and 3.0 including Xalan and Saxon run in Java. In XSLT, you can find most of your SQL counterparts as documents can be transformed according to XPath expressions, template processing, and basic logic and arithmetic operations. Notice XSLT scripts are well-formed XML files (with special instructions)!

Below is an example using the top 20 Stackoverflow users as of one week in August 2015. Examples are shown of the counterpart SQL in XSLT. In this example topusers would be the table and its children would be columns. Our good friend @JB Nizet who commented above is included in the XML!

XML Source

<?xml version='1.0' encoding='UTF-8'?>
<stackoverflow>
  <topusers>
    <user>Gordon Linoff</user>
    <link>https://stackoverflow.com/users/1144035/gordon-linoff</link>
    <goldbadges>18</goldbadges>
    <silverbadges>81</silverbadges>
    <bronzebadges>134</bronzebadges>
    <membership>3 years, 6 months</membership>
    <yearRank>#1</yearRank>
    <totalReputation>320356</totalReputation>
    <yearReputation>72008</yearReputation>
    <topskill>sql</topskill>
  </topusers>
  <topusers>
    <user>Martijn Pieters</user>
    <link>https://stackoverflow.com/users/100297/martijn-pieters</link>
    <goldbadges>36</goldbadges>
    <silverbadges>787</silverbadges>
    <bronzebadges>975</bronzebadges>
    <membership>6 years, 3 months</membership>
    <yearRank>#2</yearRank>
    <totalReputation>385697</totalReputation>
    <yearReputation>66886</yearReputation>
    <topskill>python</topskill>
  </topusers>
  <topusers>
    <user>anubhava</user>
    <link>https://stackoverflow.com/users/548225/anubhava</link>
    <goldbadges>25</goldbadges>
    <silverbadges>100</silverbadges>
    <bronzebadges>173</bronzebadges>
    <membership>4 years, 7 months</membership>
    <yearRank>#3</yearRank>
    <totalReputation>279930</totalReputation>
    <yearReputation>62759</yearReputation>
    <topskill>regex</topskill>
  </topusers>
  <topusers>
    <user>Jon Skeet</user>
    <link>https://stackoverflow.com/users/22656/jon-skeet</link>
    <goldbadges>379</goldbadges>
    <silverbadges>5537</silverbadges>
    <bronzebadges>6720</bronzebadges>
    <membership>6 years, 10 months</membership>
    <yearRank>#4</yearRank>
    <totalReputation>797069</totalReputation>
    <yearReputation>57963</yearReputation>
    <topskill>c#</topskill>
  </topusers>
  <topusers>
    <user>akrun</user>
    <link>https://stackoverflow.com/users/3732271/akrun</link>
    <goldbadges>5</goldbadges>
    <silverbadges>25</silverbadges>
    <bronzebadges>58</bronzebadges>
    <membership>1 year, 1 month</membership>
    <yearRank>#5</yearRank>
    <totalReputation>88928</totalReputation>
    <yearReputation>57384</yearReputation>
    <topskill>r</topskill>
  </topusers>
  <topusers>
    <user>VonC</user>
    <link>https://stackoverflow.com/users/6309/vonc</link>
    <goldbadges>131</goldbadges>
    <silverbadges>1354</silverbadges>
    <bronzebadges>1447</bronzebadges>
    <membership>6 years, 10 months</membership>
    <yearRank>#6</yearRank>
    <totalReputation>512007</totalReputation>
    <yearReputation>56260</yearReputation>
    <topskill>git</topskill>
  </topusers>
  <topusers>
    <user>CommonsWare</user>
    <link>https://stackoverflow.com/users/115145/commonsware</link>
    <goldbadges>53</goldbadges>
    <silverbadges>1104</silverbadges>
    <bronzebadges>1155</bronzebadges>
    <membership>6 years, 2 months</membership>
    <yearRank>#7</yearRank>
    <totalReputation>487584</totalReputation>
    <yearReputation>54921</yearReputation>
    <topskill>android</topskill>
  </topusers>
  <topusers>
    <user>T.J. Crowder</user>
    <link>https://stackoverflow.com/users/157247/t-j-crowder</link>
    <goldbadges>51</goldbadges>
    <silverbadges>556</silverbadges>
    <bronzebadges>692</bronzebadges>
    <membership>5 years, 11 months</membership>
    <yearRank>#8</yearRank>
    <totalReputation>374425</totalReputation>
    <yearReputation>54121</yearReputation>
    <topskill>javascript</topskill>
  </topusers>
  <topusers>
    <user>Hans Passant</user>
    <link>https://stackoverflow.com/users/17034/hans-passant</link>
    <goldbadges>62</goldbadges>
    <silverbadges>679</silverbadges>
    <bronzebadges>1277</bronzebadges>
    <membership>6 years, 10 months</membership>
    <yearRank>#9</yearRank>
    <totalReputation>565002</totalReputation>
    <yearReputation>53886</yearReputation>
    <topskill>c#</topskill>
  </topusers>
  <topusers>
    <user>BalusC</user>
    <link>https://stackoverflow.com/users/157882/balusc</link>
    <goldbadges>148</goldbadges>
    <silverbadges>1925</silverbadges>
    <bronzebadges>2235</bronzebadges>
    <membership>5 years, 11 months</membership>
    <yearRank>#10</yearRank>
    <totalReputation>584857</totalReputation>
    <yearReputation>53693</yearReputation>
    <topskill>java</topskill>
  </topusers>
  <topusers>
    <user>dasblinkenlight</user>
    <link>https://stackoverflow.com/users/335858/dasblinkenlight</link>
    <goldbadges>31</goldbadges>
    <silverbadges>342</silverbadges>
    <bronzebadges>617</bronzebadges>
    <membership>5 years, 3 months</membership>
    <yearRank>#11</yearRank>
    <totalReputation>354707</totalReputation>
    <yearReputation>52779</yearReputation>
    <topskill>java</topskill>
  </topusers>
  <topusers>
    <user>Eran</user>
    <link>https://stackoverflow.com/users/1221571/eran</link>
    <goldbadges>18</goldbadges>
    <silverbadges>114</silverbadges>
    <bronzebadges>187</bronzebadges>
    <membership>3 years, 5 months</membership>
    <yearRank>#12</yearRank>
    <totalReputation>105090</totalReputation>
    <yearReputation>49529</yearReputation>
    <topskill>java</topskill>
  </topusers>
  <topusers>
    <user>Avinash Raj</user>
    <link>https://stackoverflow.com/users/3297613/avinash-raj</link>
    <goldbadges>6</goldbadges>
    <silverbadges>25</silverbadges>
    <bronzebadges>59</bronzebadges>
    <membership>1 year, 5 months</membership>
    <yearRank>#13</yearRank>
    <totalReputation>96627</totalReputation>
    <yearReputation>48928</yearReputation>
    <topskill>regex</topskill>
  </topusers>
  <topusers>
    <user>Arun P Johny</user>
    <link>https://stackoverflow.com/users/114251/arun-p-johny</link>
    <goldbadges>30</goldbadges>
    <silverbadges>194</silverbadges>
    <bronzebadges>254</bronzebadges>
    <membership>6 years, 2 months</membership>
    <yearRank>#14</yearRank>
    <totalReputation>218919</totalReputation>
    <yearReputation>45858</yearReputation>
    <topskill>jquery</topskill>
  </topusers>
  <topusers>
    <user>Alex Martelli</user>
    <link>https://stackoverflow.com/users/95810/alex-martelli</link>
    <goldbadges>64</goldbadges>
    <silverbadges>752</silverbadges>
    <bronzebadges>1056</bronzebadges>
    <membership>6 years, 3 months</membership>
    <yearRank>#15</yearRank>
    <totalReputation>402886</totalReputation>
    <yearReputation>44772</yearReputation>
    <topskill>python</topskill>
  </topusers>
  <topusers>
    <user>unutbu</user>
    <link>https://stackoverflow.com/users/190597/unutbu</link>
    <goldbadges>28</goldbadges>
    <silverbadges>473</silverbadges>
    <bronzebadges>617</bronzebadges>
    <membership>5 years, 9 months</membership>
    <yearRank>#16</yearRank>
    <totalReputation>295951</totalReputation>
    <yearReputation>44389</yearReputation>
    <topskill>python</topskill>
  </topusers>
  <topusers>
    <user>JB Nizet</user>
    <link>https://stackoverflow.com/users/571407/jb-nizet</link>
    <goldbadges>22</goldbadges>
    <silverbadges>307</silverbadges>
    <bronzebadges>491</bronzebadges>
    <membership>4 years, 6 months</membership>
    <yearRank>#17</yearRank>
    <totalReputation>319659</totalReputation>
    <yearReputation>43697</yearReputation>
    <topskill>java</topskill>
  </topusers>
  <topusers>
    <user>Quentin</user>
    <link>https://stackoverflow.com/users/19068/quentin</link>
    <goldbadges>42</goldbadges>
    <silverbadges>491</silverbadges>
    <bronzebadges>664</bronzebadges>
    <membership>6 years, 10 months</membership>
    <yearRank>#18</yearRank>
    <totalReputation>397762</totalReputation>
    <yearReputation>43626</yearReputation>
    <topskill>javascript</topskill>
  </topusers>
  <topusers>
    <user>Darin Dimitrov</user>
    <link>https://stackoverflow.com/users/29407/darin-dimitrov</link>
    <goldbadges>108</goldbadges>
    <silverbadges>2089</silverbadges>
    <bronzebadges>2107</bronzebadges>
    <membership>6 years, 9 months</membership>
    <yearRank>#19</yearRank>
    <totalReputation>604319</totalReputation>
    <yearReputation>43621</yearReputation>
    <topskill>c#</topskill>
  </topusers>
  <topusers>
    <user>alecxe</user>
    <link>https://stackoverflow.com/users/771848/alecxe</link>
    <goldbadges>19</goldbadges>
    <silverbadges>108</silverbadges>
    <bronzebadges>194</bronzebadges>
    <membership>4 years, 2 months</membership>
    <yearRank>#20</yearRank>
    <totalReputation>116413</totalReputation>
    <yearReputation>42215</yearReputation>
    <topskill>python</topskill>
  </topusers>
</stackoverflow>

SELECT statement

SELECT User, Membership, Link
FROM topusers
WHERE topskill = 'Java'
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <xsl:template match="/stackoverflow">
    <xsl:copy>      
      <xsl:apply-templates select="topusers[topskill='java']"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="topusers">
    <xsl:copy>      
      <xsl:copy-of select="user"/>
      <xsl:copy-of select="membership"/>
      <xsl:copy-of select="link"/>
    </xsl:copy>
  </xsl:template>

</xsl:transform>

GROUP BY

SELECT topskill AS skill, SUM(goldbadges) AS SumOfGoldBadges,     
       SUM(silverbadges) AS SumOfSilverBadges, SUM(bronzebadges) AS SumOfBronzeBadges, 
       AVG(totalReputation) AS AvgOfReps,
       Min(yearReputation) As MinOfYrReps, Max(yearReputation) As MaxOfYrReps
FROM topUsers
GROUP BY topskill
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
               xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

<xsl:key name="skillkey" match="topusers" use="topskill" />

  <xsl:template match="/stackoverflow">
    <xsl:copy>
      <xsl:apply-templates select="topusers[generate-id() = generate-id(key('skillkey', topskill)[1])]"/>
    </xsl:copy>
  </xsl:template>

  <!-- Muenchian Method -->
  <xsl:template match="topusers">
    <xsl:variable name="curr-group" select="key('skillkey', topskill)" />

    <skill_group>      
      <skill><xsl:value-of select="$curr-group/topskill"/></skill>
      <SumOfGoldBadges><xsl:value-of select="sum($curr-group/goldbadges)"/></SumOfGoldBadges>
      <SumOfSilverBadges><xsl:value-of select="sum($curr-group/silverbadges)"/></SumOfSilverBadges>
      <SumOfBronzeBadges><xsl:value-of select="sum($curr-group/bronzebadges)"/></SumOfBronzeBadges>
      <AvgOfReps><xsl:value-of select="sum($curr-group/totalReputation)
                                            div count($curr-group/totalReputation)"/></AvgOfReps>

         <xsl:variable name="repsSorted">
              <xsl:for-each select="$curr-group">
                  <xsl:sort select="yearReputation" data-type="number" order="ascending"/>
                  <xsl:copy-of select="yearReputation"/>
              </xsl:for-each>
          </xsl:variable>
          <xsl:variable name="repsSortedSet" select="exsl:node-set($repsSorted)/yearReputation" />

      <MinOfYrReps><xsl:value-of select="$repsSortedSet[1]"/></MinOfYrReps>
      <MaxOfYrReps><xsl:value-of select="$repsSortedSet[last()]"/></MaxOfYrReps>        
    </skill_group>    
  </xsl:template>

</xsl:transform>

ORDER BY

SELECT * FROM topusers
ORDER BY goldbages DESC
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="stackoverflow">
    <xsl:copy>
      <xsl:apply-templates select="topusers">
        <xsl:sort select="goldbadges" order="descending" data-type="number"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

</xsl:transform>

CASE/WHEN

SELECT user, membership, link,  
       CASE 
            WHEN membership LIKE '%1 year%' OR membership LIKE '%2 years%'
                 OR membership LIKE '%3 years%'
            THEN 'recent member'
            WHEN membership LIKE '%4 years%' OR membership LIKE '%5 years%'
            THEN 'mid member'
            ELSE 'long member'
       END AS member_type
FROM topusers
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <xsl:template match="/stackoverflow">
    <xsl:copy>      
      <xsl:apply-templates select="topusers"/>
    </xsl:copy>
  </xsl:template>  

  <xsl:template match="topusers">
    <xsl:copy>      
      <xsl:choose>
        <xsl:when test="contains(membership, '1 year') or contains(membership, '2 years')
                        or contains(membership, '3 years')">          
            <xsl:copy-of select="user"/>
            <xsl:copy-of select="membership"/>
            <xsl:copy-of select="link"/>
            <member_type>recent member</member_type>
        </xsl:when>
        <xsl:when test="contains(membership, '4 years') or contains(membership, '5 years')">                      
            <xsl:copy-of select="user"/>
            <xsl:copy-of select="membership"/>
            <xsl:copy-of select="link"/>
            <member_type>mid member</member_type>
        </xsl:when>  
        <xsl:otherwise>                      
            <xsl:copy-of select="user"/>
            <xsl:copy-of select="membership"/>
            <xsl:copy-of select="link"/>
            <member_type>long member</member_type>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:copy>
  </xsl:template>

</xsl:transform>

Java (to run an XSLT script on XML source)

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.transform.*;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.OutputKeys;

import java.io.*;    
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

import org.w3c.dom.Document;
import org.xml.sax.SAXException;

public class CourseList {
    public static void main(String[] args) throws IOException, URISyntaxException,
                                                  SAXException, ParserConfigurationException,
                                                  TransformerException {

            // Load XML and XSL Document
            String inputXML = "/path/to/Input.xml";
            String xslFile = "/path/to/XSLT/Script.xsl";
            String outputXML = "/path/to/Output.xml";

            Source xslt = new StreamSource(new File(xslFile));            
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();            
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
            Document doc = docBuilder.parse (new File(inputXML));

            // XSLT Transformation with pretty print
            TransformerFactory prettyPrint = TransformerFactory.newInstance();
            Transformer transformer = prettyPrint.newTransformer(xslt);

            // Dynamic XSLT
            String root = "stackoverflow";
            String TableName = "topusers";
            String Column1 = "user";
            String Column2 = "membership";
            String Column3 = "link";
            String Column4 = "topskill";
            String wherevalue = "java";

            String xslStr = String.join("\n",
                "<xsl:transform xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">",
                "<xsl:output version=\"1.0\" encoding=\"UTF-8\" indent=\"yes\" />",
                "<xsl:strip-space elements=\"*\"/> ",
                "  <xsl:template match=\"/"+root+"\">",
                "    <xsl:copy>",
                "      <xsl:apply-templates select=\""+TableName+"["+Column4+"='"+wherevalue+"']\"/>",
                "    </xsl:copy>",
                "  </xsl:template>",
                "  <xsl:template match=\""+TableName+"\">",
                "    <xsl:copy>",
                "      <xsl:copy-of select=\""+Column1+"\"/>",
                "      <xsl:copy-of select=\""+Column2+"\"/>",
                "      <xsl:copy-of select=\""+Column3+"\"/>",
                "    </xsl:copy>",
                "  </xsl:template>",
                "</xsl:transform>");

            // Parse XSLT String and Configure Transformer
            Source xslt = new StreamSource(new StringReader(xslStr));
            TransformerFactory prettyPrint = TransformerFactory.newInstance()
            Transformer transformer = prettyPrint.newTransformer(xslt);

            transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
            transformer.setOutputProperty(OutputKeys.STANDALONE, "yes");
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");                        

            DOMSource source = new DOMSource(doc);
            StreamResult result = new StreamResult(new File(outputXML));        
            transformer.transform(source, result);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM