简体   繁体   English

从OMElement获取InputStream

[英]Getting InputStream from OMElement

I tried following sample[1] ; 我尝试了以下样本[1]; but since my OMElement is too large, (I'm converting a file, (800MB) as OMelement , it is coming from another process) I face following issues, 但是由于我的OMElement太大(我正在将文件(800MB)转换为OMelement ,它来自另一个进程),因此我面临以下问题,

  • Process goes out of memory 进程内存不足
  • Serialize takes much time. 序列化需要很多时间。

Can anyone point me right solution? 谁能为我指出正确的解决方案?

[1] [1]

 BufferedReader in = null;
 ByteArrayOutputStream baos = null;
 InputStream is = null;
 try {

    baos = new ByteArrayOutputStream();
    fileContent.serialize(baos);

    is = new ByteArrayInputStream(baos.toByteArray());

    in = new BufferedReader(new InputStreamReader(is));

Unfortunately your question doesn't provide a clear description of the actual problem you are trying to solve. 不幸的是,您的问题并未提供对要解决的实际问题的清晰描述。 Instead it describes an issue with what you believe to be the solution to your problem. 相反,它描述了一个您认为可以解决问题的问题。 Therefore I can only try to reconstruct the problem based on the comments you made in response to Ian Roberts. 因此,我只能根据您对Ian Roberts的评论来重构问题。

If my interpretation of these comments is correctly, then the problem is as follows. 如果我对这些评论的解释正确,那么问题如下。 You have an XML document that contains an element with a long sequence of characters, which is structured into multiple lines: 您有一个XML文档,其中包含带有长字符序列的元素,该元素由多行构成:

<some_element>
line 1
line 2
line 3
...
line N
</some_element>

You want to process the content of the element line by line, but N is large, so that you need to find a memory efficient way to do that, ie an approach that avoids loading the entire content into memory. 您希望逐行处理元素的内容,但是N很大,因此您需要找到一种内存有效的方式来做到这一点,即避免将整个内容加载到内存中的方法。

The code snippet you have provided shows that you took a wrong direction when trying to solve that problem. 您提供的代码段表明,尝试解决该问题时方向错误。 The code serializes the OMElement representing some_element and then creates an InputStream / Reader from the serialized output. 该代码对表示some_elementOMElement序列化,然后从序列化的输出中创建InputStream / Reader However, that would also contain the start and end tags for some_element , which is not what you want. 但是,它也将包含some_element的开始和结束标记,这不是您想要的。 Instead you are only interested in the content of the element. 相反,您只对元素的内容感兴趣。 If you look at the OMElement interface, you can see that it actually defines a method that returns that content as a Reader . 如果查看OMElement接口,则可以看到它实际上定义了一个将内容作为Reader返回的方法。 It is called getTextAsStream and the Javadoc explains how to use that method in such a way that the memory usage is O(1) instead of O(N). 它称为getTextAsStreamJavadoc解释了如何以内存使用量为O(1)而不是O(N)的方式使用该方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM