简体   繁体   中英

How to parse a big rdf file in rdf4j

I want to parse a huge file in RDF4J using the following code but I get an exception due to parser limit;

public class ConvertOntology {

    public static void main(String[] args) throws RDFParseException, RDFHandlerException, IOException {

        String file =  "swetodblp_april_2008.rdf";
        File initialFile = new File(file);
        InputStream input = new FileInputStream(initialFile);
        RDFParser parser = Rio.createParser(RDFFormat.RDFXML);
        parser.setPreserveBNodeIDs(true); 
        Model model = new LinkedHashModel();
        parser.setRDFHandler(new StatementCollector(model));
        parser.parse(input, initialFile.getAbsolutePath());
        FileOutputStream out = new FileOutputStream("swetodblp_april_2008.nt");
            RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, out);
        try {
          writer.startRDF();
          for (Statement st: model) {
                    writer.handleStatement(st);
          }
          writer.endRDF();
        }
        catch (RDFHandlerException e) {
        }
        finally {
          out.close();
        }

    }

The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application.

I execute my code as following as suggested on the RDF4J web site to set up the two parameters (as in the following command)

mvn -Djdk.xml.totalEntitySizeLimit=0 -DentityExpansionLimit=0 exec:java

any help please

The error is due to the Apache Xerces XML parser, rather than the default JDK XML parser. So Just delete Xerces XML folder from you .m2 repository and the code works fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM