简体   繁体   English

如何提高 C# 中 Saxon 求值的速度?

[英]How to increase the speed of Saxon evaluation in C#?

I'm currently using Saxon to process Xquery in our .NET application.我目前正在使用 Saxon 在我们的 .NET 应用程序中处理 Xquery。 We're working with really big XML files (~2GB).我们正在处理非常大的 XML 文件(~2GB)。 When running the Xquery against one of these files using the Saxon binary file directly, the time it takes to complete the evaluation is around 2 minutes, but when doing the evaluation from my C# application the time elapsed increases to around 10 minutes, and I haven't yet been able to identify what I'm doing wrong.直接使用 Saxon 二进制文件对这些文件之一运行 Xquery 时,完成评估所需的时间约为 2 分钟,但从我的 C# 应用程序进行评估时,所用时间增加到大约 10 分钟,而我还没有还不能确定我做错了什么。

This is what I'm doing when I run the Xquery using the Saxon binary file through the command line:当我通过命令行使用 Saxon 二进制文件运行 Xquery 时,这就是我正在做的事情:

Query.exe -config:config.xml -q:XQueryTest.txt

These are the contents of the config.xml :这些是config.xml的内容:

<configuration xmlns="http://saxon.sf.net/ns/configuration" edition="HE">
  <xquery defaultElementNamespace="http://www.irs.gov/efile"/>
</configuration>

And XQueryTest.txt contains the Xquery we are going to process. XQueryTest.txt包含我们将要处理的 Xquery。 When running the Xquery from the command line, we modify it to indicate the file we will run it against, using the doc() function.当从命令行运行 Xquery 时,我们使用doc()函数修改它以指示我们将针对哪个文件运行它。 Here is a sample line:这是一个示例行:

for 
    $ReturnData at $currentReturnDataPos in if(exists(doc("2GB.XML")/Return/ReturnData)) then doc("2GB.XML")/Return/ReturnData else element{'ReturnData'} {''}

As mentioned above, running this command, takes about 2 minutes to complete.如上所述,运行此命令大约需要 2 分钟才能完成。

Now these is what I'm doing in my .NET application to make this same evaluation.现在这些就是我在我的 .NET 应用程序中所做的,以进行相同的评估。

Processor processor = new Processor();
DocumentBuilder documentBuilder = processor.NewDocumentBuilder();
documentBuilder.IsLineNumbering = true;
documentBuilder.WhitespacePolicy = WhitespacePolicy.PreserveAll;
XQueryCompiler compiler = processor.NewXQueryCompiler();

string query = BuildXqueryString();

if (!String.IsNullOrEmpty(query))
{
    XQueryExecutable executable = compiler.Compile(query);
    XQueryEvaluator evaluator = executable.Load();

    using (XmlReader myReader = XmlReader.Create(@"C:\Users\Administrator\Desktop\2GB.xml"))
    {
        evaluator.ContextItem = documentBuilder.Build(myReader);
    }

    var evaluations = evaluator.Evaluate();
}

The issue we have is in this line: evaluator.ContextItem = documentBuilder.Build(myReader) .我们的问题在这一行: evaluator.ContextItem = documentBuilder.Build(myReader) Which is not even the evaluation, but just the loading of the file.这甚至不是评估,而只是文件的加载。 This line takes just too much time to execute, and I need to know if that is expected, or if there's a way to increase its speed.这条线需要太多时间来执行,我需要知道这是否符合预期,或者是否有办法提高其速度。 I have used all the different overloads of the Build() method and they all take a lot of time to complete, way more than the 2 minutes that the execution takes when executing from the command line.我已经使用了Build()方法的所有不同重载,它们都需要大量时间才能完成,远远超过从命令行执行时执行所需的 2 分钟。

Regarding using the streaming capacity of Saxon to read the file by parts, because of the Xqueries we generate, that is not an option, as the Xquery can combine information in any part of the XML.关于使用 Saxon 的流容量来分部分读取文件,由于我们生成的 Xquery,这不是一个选项,因为 Xquery 可以组合 XML 中任何部分的信息。

We have seen a similar 5-to-1 ratio between Saxon on the Java platform and Saxon on the .NET platform in some cases, and we haven't got to the bottom of why it happens despite extensive investigation.在某些情况下,我们已经看到 Java 平台上的 Saxon 和 .NET 平台上的 Saxon 之间的比例类似,为 5 比 1,尽管进行了广泛的调查,但我们还没有弄清楚为什么会发生这种情况。 Part of the reason is that it seems to be inconsistent.部分原因是它似乎不一致。 When we first shipped Saxon on .NET using the IKVMC cross-compiler, the ratio was much better, with only about a 25% overhead on .NET, but there seem to have been a number of changes in technology since then: Java VMs have got faster, IKVMC has switched from using the GNU Classpath library to OpenJDK, and .NET itself hasn't stood still.当我们第一次使用 IKVMC 交叉编译器在 .NET 上发布 Saxon 时,这个比例要好得多,在 .NET 上只有大约 25% 的开销,但从那时起,技术似乎发生了许多变化:Java VM变得更快,IKVMC 已经从使用 GNU Classpath 库切换到 OpenJDK,而 .NET 本身并没有停滞不前。

It's new to me, though, that the same code should run much faster from the .NET command line than it runs from the .NET API.不过,对我来说,相同的代码从 .NET 命令行运行应该比从 .NET API 运行快得多。

The big difference here is that when you run from the command line, Saxon builds the document using the Apache Xerces parser (converted to .NET code using IKVMC), whereas when you use DocumentBuilder.build() in the way shown, you are using Microsoft's XmlReader.这里的最大区别在于,当您从命令行运行时,Saxon 使用 Apache Xerces 解析器构建文档(使用 IKVMC 转换为 .NET 代码),而当您以所示方式使用 DocumentBuilder.build() 时,您正在使用微软的 XmlReader。

I would expect the document building to run fastest when you supply a (file system) URI, but I can't say I've measured it.当您提供(文件系统)URI 时,我希望文档构建运行得最快,但我不能说我已经对其进行了测量。 It might be worth doing some experiments (perhaps with smaller files) and showing us the results.可能值得做一些实验(也许使用较小的文件)并向我们展示结果。 Alternatively, have you tried using the doc() method from your application, rather than building the document first?或者,您是否尝试过使用应用程序中的 doc() 方法,而不是先构建文档?

The slow performance is caused by using the .NET XmlReader to do the parsing.性能缓慢是由于使用 .NET XmlReader 进行解析造成的。 The Push/Pull SAX eventing handling with the .NET XML parser and the Saxon receiver is much slower than using the JAXP xerces parser directly, which is supplied within Saxon.使用 .NET XML 解析器和 Saxon 接收器的 Push/Pull SAX 事件处理比直接使用 Saxon 中提供的 JAXP xerces 解析器慢得多。

To force the JAXP parser, you can do the following should work:要强制 JAXP 解析器,您可以执行以下操作:

evaluator.ContextItem = documentBuilder.Build(new Uri("file:///C:\\Users\\Administrator\\Desktop\\2GB.xml")); evaluator.ContextItem = documentBuilder.Build(new Uri("file:///C:\\Users\\Administrator\\Desktop\\2GB.xml"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM