简体   繁体   English

铸造昂贵的操作?

[英]Is casting an expensive operation?

Scenario : 场景

  • I am parsing a big file (character file) . 我正在解析一个大文件(字符文件)。 For example a .csv file (not exactly my case) 例如.csv文件(不完全是我的情况)
  • I cannot hold the entire file in memory . 我无法将整个文件保存在内存中。 So I must implement a buffer strategy . 所以我必须实施一个缓冲策略。
  • I want to build a generic handler that will keep a constant number of lines in memory (as Strings) . 我想构建一个通用的处理程序,它将在内存中保持一定数量的行(如Strings)。 This handler fetch other lines if necessary while removing the unneeded lines . 如果需要,此处理程序在删除不需要的行时获取其他行。
  • Over this handler I will build a parser that will transform the lines into Java objects and operate changes on those objects . 在这个处理程序上,我将构建一个解析器,它将行转换为Java对象并对这些对象进行更改。 Once the changes are done (update some fields on the objects) persist the changes back to the file . 完成更改(更新对象上的某些字段)后,将更改保留回文件。

Should I : 我应该

  • Instead of keep the buffer as an array of strings, should I keep the buffer directly as objects (doing a single cast) ? 不是将缓冲区保持为字符串数组,而应该将缓冲区直接保存为对象(进行单个转换)? or... 要么...
  • Keep the buffer as lines, every time I need to operate on the buffer, cast the info to the right object, do the changes, persist the changes back to the file . 将缓冲区保持为行,每次我需要对缓冲区进行操作,将信息转换为正确的对象,进行更改,将更改保留回文件。 Sequential operations will need supplementary casts . 顺序操作需要补充演员。

I will have to keep the things simple . 我必须保持简单。 Any suggestions ? 有什么建议 ?

Casting doesn't change the amount of memory an object occupies. 强制转换不会更改对象占用的内存量。 It just changes the runtime type. 它只是改变了运行时类型。

If you can do those operations on a per-row basis, then just do the operation immediately inside the loop wherein you read a single line. 如果您可以按行进行这些操作,那么只需在循环内执行操作即可读取单行。

while ((line = reader.readLine()) != null) {
    line = process(line);
    writer.println(line);
}

This way you effectively end up with only a single line in Java's memory everytime instead of the whole file. 这样,您每次只能在Java内存中只有一行而不是整个文件。

Or if you need to do those operations based on the entire CSV file (ie, those operations are dependent on all rows), then your most efficient bet is to import the CSV file in a real SQL database and then use SQL statements to alter the data and then export it to CSV file again. 或者,如果您需要基于整个CSV文件执行这些操作(即,这些操作依赖于所有行),那么最有效的方法是将CSV文件导入真实的SQL数据库,然后使用SQL语句来更改数据,然后再次将其导出为CSV文件。

I'd recommend using a MappedByteBuffer (from NIO), that you can use to read a file too big to fit into memory. 我建议使用MappedByteBuffer(来自NIO),您可以使用它来读取太大而无法放入内存的文件。 It maps only a region of the file into memory; 它只将文件的一个区域映射到内存中; once you're done reading this region (say, the first 10k), map the next one, and so on, until you've read the whole file. 一旦你读完这个区域(比如说,前10k),就要映射下一个区域,依此类推,直到你读完整个文件。 Memory-efficient and quite easy to implement. 内存效率高,易于实现。

Java Casts: like Java Casts:喜欢

Object a = new String();
String b (String) a;

are not expensive. 不贵。 -- No matter if you cast Strings or any other type. - 无论你是否施放弦乐或任何其他类型。

Your real value add will be to read each line as a String, which is pretty easy in Java. 您真正的增值将是将每行读取为String,这在Java中非常简单。 After it's in a String, it is trivial to split the string on each comma with 在它的字符串之后,在每个逗号上拆分字符串是微不足道的

String[] row = parsedRow.split(",");

The you will have a String for each value in the array, which can then be operated on. 对于数组中的每个值,您将拥有一个String,然后可以对其进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java算术运算然后进行强制转换 - Java arithmetic operation and then casting OWLOntologyManager.addAxioms()操作的价格是多少? - How expensive is the OWLOntologyManager.addAxioms() operation? 强制JLabel在昂贵的操作之前显示文本 - Forcing JLabel to display text before expensive operation Executors.newFixedThreadPool()-此操作的成本是多少 - Executors.newFixedThreadPool() - how expensive is this operation 在流操作期间转换列表 - Casting a list during a stream operation JNDI:哪个手术昂贵? LDAP rebind()或(unbind()和bind()) - JNDI: Which operation is expensive? LDAP rebind() or (unbind() and bind()) 这是Java中整数数组中更昂贵的操作交换或比较 - Which is more expensive operation swap or comparison in integer array in Java 通过反射获取字段名称是否应该避免进行昂贵的操作? - Does the getting field name via reflection is an expensive operation that should be avoided? Java TreeNode:如何防止getChildCount执行昂贵的操作? - Java TreeNode: How to prevent getChildCount from doing expensive operation? 在查询 firestore 时使用 SnapshotParser 是否是一项昂贵的操作? - Does Using SnapshotParser while querying firestore an expensive operation?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM