简体   繁体   English

Java中的大量常量

[英]Large amount of constants in Java

I need to include about 1 MByte of data in a Java application, for very fast and easy access in the rest of the source code. 我需要在Java应用程序中包含大约1 MB的数据,以便在其余的源代码中快速轻松地访问。 My main background is not Java, so my initial idea was to convert the data directly to Java source code, defining 1MByte of constant arrays, classes (instead of C++ struct) etc., something like this: 我的主要背景不是Java,所以我最初的想法是将数据直接转换为Java源代码,定义1MByte的常量数组,类(而不是C ++ struct)等,如下所示:

public final/immutable/const MyClass MyList[] = { 
  { 23012, 22, "Hamburger"} , 
  { 28375, 123, "Kieler"}
};

However, it seems that Java does not support such constructs. 但是,似乎Java不支持这样的结构。 Is this correct? 它是否正确? If yes, what is the best solution to this problem? 如果是,那么这个问题的最佳解决方案是什么?

NOTE: The data consists of 2 tables with each about 50000 records of data, which is to be searched in various ways. 注意:数据由2个表组成,每个表有大约50000个数据记录,可以通过各种方式进行搜索。 This may require some indexes later, with significant more records, maybe 1 million records, saved this way. 这可能需要稍后的一些索引,以这种方式保存更多的记录,可能是100万条记录。 I expect the application to start up very fast, without iterating through these records. 我希望应用程序启动速度非常快,而不会遍历这些记录。

I personally wouldn't put it in source form. 我个人不会把它放在源代码中。

Instead, include the data in some appropriate raw format in your jar file (I'm assuming you'll be packaging the application or library up) and use Class.getResourceAsStream or ClassLoader.getResourceAsStream to load it. 相反,在jar文件中以适当的原始格式包含数据(我假设您将打包应用程序或库)并使用Class.getResourceAsStreamClassLoader.getResourceAsStream来加载它。

You may very well want a class to encapsulate loading, caching and providing this data - but I don't see much benefit from converting it into source code. 你可能希望一个类封装加载,缓存和提供这些数据 - 但我没有看到将它转换为源代码的好处。

Due to limitations of the java bytecode files, class-files can not be larger than 64k iirc. 由于java字节码文件的限制,类文件不能大于64k iirc。 (They are simply not intended for this type of data.) (它们根本不适用于此类数据。)

I would load the data upon starting the program, using something like the following lines of code: 我会在启动程序时加载数据,使用类似下面的代码行:

import java.io.*;
import java.util.*;

public class Test {
    public static void main(String... args) throws IOException {
        List<DataRecord> records = new ArrayList<DataRecord>();
        BufferedReader br = new BufferedReader(new FileReader("data.txt"));
        String s;
        while ((s = br.readLine()) != null) {
            String[] arr = s.split(" ");
            int i = Integer.parseInt(arr[0]);
            int j = Integer.parseInt(arr[1]);
            records.add(new DataRecord(i, j, arr[0]));
        }
    }
}


class DataRecord {
    public final int i, j;
    public final String s;
    public DataRecord(int i, int j, String s) {
        this.i = i;
        this.j = j;
        this.s = s;
    }
}

( NB: The Scanner is quite slow, so don't be tempted to use it just because it has a simple interface. Stick with some form of BufferedReader and split, or StringTokenizer.) 注意:扫描仪非常慢,所以不要因为它有一个简单的界面而使用它。坚持使用某种形式的BufferedReader和split,或者StringTokenizer。)

Efficiency can of course be improved if you transform the data into a binary format. 如果将数据转换为二进制格式,当然可以提高效率。 In that case, you can make use of the DataInputStream (but don't forget to go through some BufferedInputStream or BufferedReader ) 在这种情况下,您可以使用DataInputStream (但不要忘记通过一些BufferedInputStreamBufferedReader

Depending on how you wish to access the data, you might be better off storing the records in a hash-map ( HashMap<Integer, DataRecord> ) (having i or j as the key). 根据您希望如何访问数据,最好将记录存储在哈希映射( HashMap<Integer, DataRecord> )中(将ij作为键)。

If you wish to load the data at the same time as the JVM loads the class file itself (roughly!) you could do the read / initialization, not within a method, but ecapsulated in static { ... } . 如果您希望在JVM加载类文件本身的同时加载数据(大致!),您可以进行读取/初始化,而不是在方法中,而是在static { ... }封装。


For a memory-mapped approach , have a look at the java.nio.channels -package in java. 对于内存映射方法 ,请查看java中的java.nio.channels -package。 Especially the method 特别是方法

public abstract MappedByteBuffer map(FileChannel.MapMode mode, long position,long size) throws IOException

Complete code examples can be found here . 完整的代码示例可以在这里找到。


Dan Bornstein (the lead developer of DalvikVM) explains a solution to your problem in this talk (Look around 0:30:00). Dan Bornstein(DalvikVM的首席开发人员)在本次演讲中解释了您的问题解决方案(请查看0:30:00左右)。 However I doubt the solution applies to as much data as a megabyte. 但是我怀疑这个解决方案适用于兆字节数据。

An idea is that you use enumerators, but I'm not sure if this suits to your implementation, and it also depends on how you are planning to use the data. 一个想法是你使用枚举器,但我不确定这是否适合你的实现,它还取决于你计划如何使用数据。

public enum Stuff {

 HAMBURGER (23012, 22),
 KIELER    (28375, 123);

 private int a;
 private int b;

 //private instantiation, does not need to be called explicitly.
 private Stuff(int a, int b) {
    this.a = a;
    this.b = b;
  }

 public int getAvalue() {
   return this.a;
 }

 public int getBvalue() {
   return this.b;
 }

} }

These can then be accessed like: 然后可以访问这些:

Stuff someThing = Stuff.HAMBURGER;
int hamburgerA = Stuff.HAMBURGER.getA() // = 23012

Another idea is using a static initializer to set private fields of a class. 另一个想法是使用静态初始化程序来设置类的私有字段。

Putting the data into source could would actually not be the fastest solution, not by a long shot. 将数据放入源可能实际上不是最快的解决方案,而不是长期的解决方案。 Loading a Java class is quite complex and slow (at least on a platform that does bytecode verification, not sure about Android). 加载Java类非常复杂和缓慢(至少在进行字节码验证的平台上,不确定Android)。

The fastest possible way to do this would be to define your own binary index format. 最快的方法是定义自己的二进制索引格式。 You could then read that as a byte[] (possibly using memory mapping) or even a RandomAccessFile without interpreting it in any way until you start accessing it. 然后,您可以将其作为byte[] (可能使用内存映射)或甚至RandomAccessFile读取,而无需以任何方式解释它,直到您开始访问它。 The cost of this would be the complexity of the code that accesses it. 这样做的代价是访问它的代码的复杂性。 With fixed-size records, a sorted list of records that's accessed via binary search would still be pretty simple, but anything else is going to get ugly. 对于固定大小的记录,通过二进制搜索访问的记录的排序列表仍然非常简单,但其他任何东西都会变得难看。

Though before doing that, are you sure this isn't premature optimization? 虽然在此之前,你确定这不是过早的优化吗? The easiest (and probably still quite fast) solution would be to jsut serialize a Map, List or array - have you tried this and determined that it is, in fact, too slow? 最简单(也可能仍然很快)的解决方案是jsut序列化Map,List或数组 - 你试过这个并确定它实际上太慢了吗?

convert the data directly to Java source code, defining 1MByte of constant arrays, classes 将数据直接转换为Java源代码,定义1MByte的常量数组,类

Be aware that there are strict constraints on the size of classes and their structures [ref JVM Spec . 请注意,对类及其结构的大小有严格的限制[参考JVM规范

This is how you define it in Java, if I understood what you are after: 这就是你用Java定义它的方式,如果我理解你的目标:

public final Object[][] myList = { 
          { 23012, 22, "Hamburger"} , 
          { 28375, 123, "Kieler"}
        };

It looks like you plan to write your own lightweight database. 您似乎计划编写自己的轻量级数据库。
If you can limit the length of the String to a realistic max size the following might work: 如果您可以将String的长度限制为实际的最大大小,则以下内容可能有效:

  • write each entry into a binary file, the entries have the same size, so you waste some bytes with each entry(int a, int b,int stringsize, string, padding) 将每个条目写入二进制文件,条目大小相同,因此每个条目都浪费一些字节(int a,int b,int stringsize,string,padding)
  • To read an entry open the file as a random access file, multiply the index with the length of an entry to get the offset and seek the position. 要读取条目,请将文件作为随机访问文件打开,将索引与条目的长度相乘以获取偏移量并查找位置。
  • Put the bytes into a bytebuffer and read the values, the String has to be converted with the String(byte[] ,int start, int length,Charset) ctor. 将字节放入bytebuffer并读取值,String必须使用String(byte [],int start,int length,Charset)ctor进行转换。

If you can't limit the length of a block dump the strings in an additional file and only store the offsets in your table. 如果无法限制块的长度,则将字符串转储到附加文件中,并仅将偏移量存储在表中。 This requires an additional file access and makes modifiying the data hard. 这需要额外的文件访问权限,并且难以修改数据。
Some informationa about random file-access in java can be found here http://java.sun.com/docs/books/tutorial/essential/io/rafs.html . 有关java中随机文件访问的一些信息可以在http://java.sun.com/docs/books/tutorial/essential/io/rafs.html找到。

For faster access you can cache some of your read entries in a Hashmap and always remove the oldest from the map when reading a new one. 为了更快地访问,您可以在Hashmap中缓存一些读取条目,并在读取新映射时始终从地图中删除最旧的条目。
Pseudo code (wont compile): 伪代码(不会编译):

class MyDataStore
{
   FileChannel fc = null;
   Map<Integer,Entry> mychace = new HashMap<Integer, Entry>();
   int chaceSize = 50000;
   ArrayList<Integer> queue = new ArrayList();
   static final int entryLength = 100;//byte
   void open(File f)throws Exception{fc = f.newByteChannel()}
   void close()throws Exception{fc.close();fc = null;}
   Entry getEntryAt(int index)
   {
       if(mychace.contains(index))return mychace.get(index);

       long pos = index * entryLength; fc.seek(pos);ByteBuffer 
       b = new ByteBuffer(100);
       fc.read(b);
       Entry a = new Entry(b);
       queue.add(index);
       mychace.put(index,a);
       if(queue.size()>chacesize)mychace.remove(queue.remove(0));
       return a;
   }

}
class Entry{
   int a; int b; String s;
   public Entry(Bytebuffer bb)
   {
     a = bb.getInt(); 
     b = bb.getInt(); 
     int size = bb.getInt();
     byte[] bin = new byte[size];
     bb.get(bin);
     s = new String(bin);
   }
}

Missing from the pseudocode: 缺少伪代码:

  • writing, since you need it for constant data 写作,因为你需要它来获取恒定的数据
  • total number of entries/sizeof file, only needs an additional integer at the beginning of the file and an additional 4 byte offset for each access operation. 条目总数/ sizeof文件,在文件开头只需要一个额外的整数,每个访问操作需要一个额外的4字节偏移量。

我建议使用资产来存储这些数据。

You could also declare a static class (or a set of static classes) exposing the desidered values as methods . 您还可以声明一个静态类(或一组静态类),将所需的值公开为方法 After all, you want your code to be able to find the value for a given name, and don't want the value to change. 毕竟,您希望代码能够找到给定名称的值,并且不希望值发生更改。

So: location=MyLibOfConstants.returnHamburgerLocation().zipcode 所以:location = MyLibOfConstants.returnHamburgerLocation()。zipcode

And you can store this stuff in a hashtable with lazyinitialization, if you thing that calculating it on the fly would be a waste of time. 并且你可以将这些东西存储在具有lazyinitialization的哈希表中,如果你在飞行中计算它会浪费时间。

Isn't a cache what you need? 不是你需要的缓存吗? As classes it is loaded in the memory, not really limited to a defined size, should be as fast as using constants... Actually it can even search data with some kind of indexes (exemple with the object hashcode...) You can for exemple create all your data arrays (ex { 23012, 22, "Hamburger"}) and then create 3 hashmap: map1.put(23012,hamburgerItem); 作为类加载到内存中的类,并不仅限于定义的大小,应该与使用常量一样快......实际上它甚至可以使用某种索引搜索数据(例如使用对象哈希码...)你可以例如,创建所有数据数组(ex {23012,22,“Hamburger”}),然后创建3个hashmap:map1.put(23012,hamburgerItem); map2.put(22,hamburgerItem); map2.put(22,hamburgerItem); map3.put("Hamburger",hamburgerItem); map3.put( “汉堡包”,hamburgerItem); This way you can search very fast in one of the map according to the parameter you have... (but this works only if your keys are unique in the map... this is just an exemple that could inspire you) 通过这种方式,您可以根据您拥有的参数快速搜索其中一个地图...(但这仅适用于您的地图在地图中是唯一的...这只是一个可以激发您灵感的例子)

At work we have a very big webapp (80 weblogic instances) and it's almost what we do: caching everywhere. 在工作中,我们有一个非常大的webapp(80个weblogic实例),它几乎就是我们所做的:到处缓存。 From a countrylist in database, create a cache... 从数据库中的国家/地区列表中,创建缓存...

There are many different kind of caches, you should check the link and choose what you need... http://en.wikipedia.org/wiki/Cache_algorithms 有许多不同类型的缓存,你应该检查链接并选择你需要的... http://en.wikipedia.org/wiki/Cache_algorithms

Java serialization sounds like something that needs to be parsed... not good. Java序列化听起来像需要解析的东西......不好。 Isn't there some kind of standard format for storing data in a stream, that can be read/looked up using a standard API without parsing it? 是不是存在某种用于在流中存储数据的标准格式,可以使用标准API读取/查找而无需解析它?

If you were to create the data in code, then it would all be loaded on first use. 如果您要在代码中创建数据,那么它将在首次使用时加载。 This is unlikely to be much more efficient than loading from a separate file - as well as parsing the data in the class file, the JVM has to verify and compile the bytecodes to create each object a million times, rather than just the once if you load it from a loop. 这不太可能比从单独的文件加载更有效 - 以及解析类文件中的数据,JVM必须验证并编译字节码以创建每个对象一百万次,而不是只有一次如果你从循环中加载它。

If you want random access and can't use a memory mapped file, then there is a RandomAccessFile which might work. 如果你想随机访问并且不能使用内存映射文件,那么有一个可能有效的RandomAccessFile You need either to load a index on start, or you need to make the entries a fixed length. 您需要在启动时加载索引,或者需要使条目具有固定长度。

You might want to check whether the HDF5 libraries run on your platform; 您可能想检查HDF5库是否在您的平台上运行; it may be overkill for such a simple and small dataset though. 但是对于这样一个简单的小数据集来说可能有点过分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM