简体   繁体   English

Python或Java中强大的json解析器

[英]Robust json parser in Python or Java

I'm looking for a robust json parser in either Python or Java (so far, I've been working with Python, but as I'm using it to analyze a Java benchmark, using Java is a reasonable alternative.) 我正在寻找一种使用Python或Java的健壮的 json解析器(到目前为止,我一直在使用Python,但是由于我正在使用它来分析Java基准测试,因此使用Java是一种合理的选择。)

Robust with respect to truncated and incomplete documents . 关于删节和不完整文件的鲁棒性。

The reason is that I'm currently using caliper for some (micro-) benchmarks, and while the benchmark is still running (or if I canceled it prematurely), the output file will not be a complete JSON document. 原因是我当前正在将caliper用于某些(微型)基准测试,并且在基准测试仍在运行 (或者如果我过早取消它)的情况下,输出文件将不是完整的JSON文档。 Neither json nor simplejson will read these files which are essentially truncated at some point. jsonsimplejson都不会读取这些在某些时候被截断的文件。

(I don't like the Caliper web interface, because it is slow, does not scale to large experiment sets, and a lot of data fails to submit and is then missing from the run.) (我不喜欢Caliper Web界面,因为它速度慢,无法扩展到大型实验集,并且很多数据提交失败,然后从运行中丢失。)

Roughly, the documents look like this: 大致而言,文档如下所示:

[
  {
    // first record, in multiple lines
  },
  {
    // second record, in multiple lines
  },
  {
    // truncated record.

Right now, I'm using a nasty hack, that uses the known indentation that caliper currently produces to split the result document at },\\n\\ \\ { into chunks, then parse only these until the last one fails. 现在,我正在使用一个讨厌的hack,该漏洞使用caliper当前产生的已知缩进来将},\\n\\ \\ {处的结果文档拆分为大块,然后仅解析它们,直到最后一个失败为止。 But that is a nasty hack, and not robust towards future changes of caliper output. 但这是一个令人讨厌的骇客,并且对卡尺输出的未来变化并不稳健。 I also tried using raw_decode , but it would still expect complete documents, and not return a meaningful result at each }, . 我也尝试过使用raw_decode ,但是它仍然需要完整的文档,并且不会在每个},处返回有意义的结果。

I'm looking for an API similar to eg XML pull, which would allow me to access the document up to the point where it was truncated, in an event-based API. 我正在寻找类似于XML pull的API,这将允许我在基于事件的API中访问文档,直到被截断为止。 Essentially, I'm interested in all complete {} sections inside the wrapper [] . 本质上,我对wrapper []中的所有完整{}部分感兴趣。

Jackson supports event-based parsing. Jackson支持基于事件的解析。 It also allows you to stream the document, but use the tree API for the parts which are interesting to you. 它还允许您流式传输文档,但将树API用于您感兴趣的部分。 There's a blog post demonstrating this approach here . 有一个博客文章展示了这种方法在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM