简体   繁体   English

处理超过2 GB的字符串

[英]Handling strings more than 2 GB

I have an application where an XLS file with lots of data entered by the user is opened and the data in it is converted to XML. 我有一个应用程序,其中打开用户输入的大量数据的XLS文件,并将其中的数据转换为XML。 I have already mapped the columns in the XLS file to XML Maps. 我已经将XLS文件中的列映射到XML Maps。 When I try to use the ExportXml method in XMLMaps, I get a string with the proper XML representation of the XLS file. 当我尝试在XMLMaps中使用ExportXml方法时,我得到一个包含XLS文件的正确XML表示的字符串。 I parse this string a bit and upload it to my server. 我解析了这个字符串并将其上传到我的服务器。

The problem is, when my XLS file is really large, the string produced for XML is over 2 GB and I get a Out of Memory exception. 问题是,当我的XLS文件非常大时,为XML生成的字符串超过2 GB,我得到一个Out of Memory异常。 I understand that the limit for CLR objects is 2 GB. 我知道CLR对象的限制是2 GB。 But in my case I need to handle this scenario. 但就我而言,我需要处理这种情况。 Presently I just message asking the user to send less data. 目前我只是要求用户发送更少的数据。

Any ideas on how I can do this? 有关如何做到这一点的任何想法?

EDIT: 编辑:

This is just a jist of the operation I need to do on the generated XML. 这只是我需要对生成的XML执行的操作的一个例子。

  • Remove certain fields which are not needed for the server data. 删除服务器数据不需要的某些字段。
  • Add something like ID numbers for each row of data. 为每行数据添加ID号等内容。
  • Modify the values of certain elements. 修改某些元素的值。
  • Do validation on the data. 对数据进行验证。

While the XMLReader stream is a good idea, I cannot perform these operations by that method. 虽然XMLReader流是个好主意,但我无法通过该方法执行这些操作。 While data validation can be done by Excel itself, the other things cannot be done here. 虽然数据验证可以由Excel本身完成,但其他事情不能在这里完成。

Using XMLTextReader and XMLTextWriter and creating a custom method for each of the step is a solution I had thought of. 使用XMLTextReader和XMLTextWriter并为每个步骤创建自定义方法是我想到的解决方案。 But to go through the jist above, it requires the XML document to be gone through or processed 4 times. 但是要通过上面的jist,它需要XML文档经过或处理4次。 This is just not efficient. 这只是效率不高。

If the XML is that large, then you might be able to use Export to a temporary file, rather than using ExportXML to a string - http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.xmlmap.export.aspx 如果XML很大,那么您可以使用Export导出到临时文件,而不是将ExportXML用于字符串 - http://msdn.microsoft.com/en-us/library/microsoft.office.interop。 excel.xmlmap.export.aspx

If you then need to parse/handle the XML in C#, then for handling such large XML structures, you'll probably be better off implementing a custom XMLReader (or XMLWriter) which works at the stream level. 如果您需要在C#中解析/处理XML,那么为了处理如此大的XML结构,您可能最好实现在流级别工作的自定义XMLReader(或XMLWriter)。 See this question for some similar advice - What is the best way to parse large XML (size of 1GB) in C#? 请参阅此问题以获得一些类似的建议 - 在C#中解析大型XML(大小为1GB)的最佳方法是什么?

我想没有别的方法可以使用x64-OS和FX,如果你真的需要把整个东西放在RAM中,但是使用其他方式来处理像Stuart建议的数据可能是更好的方法...

What you need to do is to use "stream chaining", ie you open up an input stream which reads from your excel file and an output stream that writes to your xml file. 您需要做的是使用“流链接”,即打开一个从excel文件读取的输入流和一个写入xml文件的输出流。 Then your conversion class/method will take the two streams as input and read sufficient data from the input stream to be able to write to the output. 然后,您的转换类/方法将两个流作为输入,并从输入流中读取足够的数据,以便能够写入输出。

Edit: very simple minimal Example 编辑:非常简单的最小例子

Converting from file: 从文件转换:

  123
  1244125
  345345345 
  4566
  11 

to

  <List>
      <ListItem>123</ListItem>
      <ListItem>1244125</ListItem>
      ...
  </List>

using 运用

  void Convert(Stream fromStream, Stream toStream)
  {
     using(StreamReader from= new StreamReader(fromStream))
     using(StreamWriter to = new StreamWriter(toStream))
     {
        to.WriteLine("<List>");
        while(!from.EndOfStream)
        {
            string bulk = from.ReadLine(); //in this case, a single line is sufficient
            //some code to parse the bulk or clean it up, e.g. remove '\r\n' 
            to.WriteLine(string.Format("<ListItem>{0}</ListItem>", bulk));
        }
        to.WriteLine("</List>"); 
      }

  }

  Convert(File.OpenRead("source.xls"), File.OpenWrite("source.xml"));

Of course you could do this in much more elegent, abstract manner but this is only to show my point 当然,你可以用更加抽象的方式做到这一点,但这只是为了说明我的观点

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 处理多个请求 - handling more than one request 使用DotNetZIP无法压缩超过10 GB的大文件 - Not able to compress large file more than 10 GB using DotNetZIP 数据库文件大小超过4 GB时的数据访问 - Data accessing while database file size more than 4 GB 上传超过2GB的文件时无法调试程序 - Could not debug program when upload more than 2GB file 哪个字符串比StringBuilder更有用? - Where are strings more useful than a StringBuilder? 对于Working Set PerformanceCounter,负载测试的显示不超过4GB - Load test doesn't show more than 4GB for Working Set PerformanceCounter SQL Server Filetables / FileStream:是否通过Transact SQL插入了超过2GB? - SQL Server Filetables/FileStream: Insert more than 2GB via Transact SQL? 当调用 GC.Collect() 并释放超过 3GB 的空间时,这一定是一件好事吗? - When GC.Collect() is called and frees up more than 3GB of space, is this necessarily a good thing? 如何为Windows中的进程分配超过2GB的内存? - How can I allocate more than 2GB of memory to a process in Windows? 上传超过 2 GB 的大文件。 最好的方法是什么? - Upload large size file more than 2 GB. What will be the best approach?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM