简体   繁体   English

在 Java 中快速比较大数据集中的值和小数据集中的值

[英]Compare values in large dataset with ones in small dataset quickly in Java

Goal目标

I'm attempting to create a Skyblock-like island plugin for Minecraft, and I'd like to be able to calculate an "island level" for every player's own three-dimensional island area.我正在尝试为 Minecraft 创建一个类似于 Skyblock 的岛屿插件,并且我希望能够为每个玩家自己的 3D 岛屿区域计算“岛屿等级”。

Situation情况

I have a small hash map that gives me a decimal value for some subset of the possible combinations between the types of blocks and the states that they can be in.我有一个小的hash map给我一个十进制值,用于块类型和它们可以处于的状态之间的可能组合的一些子集。

The dataset of blocks is large: there are approximately 16777216 blocks, and I need to calculate the sum of each of their "values", as given by the aforementioned map.块的数据集很大:大约有 16777216 个块,我需要计算它们每个“值”的总和,如前面提到的 map 给出的。

A naive (and very slow) implementation in "Pseudo-Java" “Pseudo-Java”中的一个幼稚(而且非常缓慢)的实现

double total = 0;
for (BlockData block : blocks) {
    for (Entry<Key, Double> entry : map) {
        Key key = entry.getKey();

        // Type check
        if (!key.getString().equals(block.getString()) continue;
        
        // States check (Only ones explicitly defined by entry must match)
        States blockStates = block.getStates();
        States keyStates = key.getStates();
        for (Entry<String, String> state : keyStates) {
            if (!state.getValue().equals(blockStates.get(state.getKey()))
                continue;
        }

        total += entry.getValue();
    }
}

How could I implement the level calculation efficiently?我怎样才能有效地实现水平计算?

PS Delta encoding isn't viable in this environment, since I can't listen to a block's setter due to API restrictions and obfuscation. PS Delta 编码在这种环境中不可行,因为由于 API 限制和混淆,我无法收听块的设置器。

I've managed to trim this down to sub-1 second times on the first run, even for 67 million blocks.我已经设法在第一次运行时将其缩减到不到 1 秒,即使是 6700 万块。 I figured I'd share my solution here.我想我会在这里分享我的解决方案。

Naive approach天真的方法

My first attempt was to simply run the naive approach on multiple threads (equal to or 2x the amount of logical processors available to the program).我的第一次尝试是简单地在多个线程上运行这种简单的方法(等于或 2 倍于程序可用的逻辑处理器数量)。 This was slow, taking about 12 seconds or so even with a small amount of blocks to process.这很慢,即使处理少量块也需要大约 12 秒左右。

Finding the bottlenecks寻找瓶颈

IntelliJ IDEA, the IDE I'm using, has a great tool pre-installed called Java Flight Recorder. IntelliJ IDEA,我正在使用的 IDE,预装了一个很棒的工具,叫做 Java Flight Recorder。 Particularly, the call tree was immensely helpful in finding and eradicating bottlenecks.特别是,调用树在发现和消除瓶颈方面非常有帮助。

The call tree lists the percentage of program time spent on any specific method calls.调用树列出了用于任何特定方法调用的程序时间百分比。 It's a nice, easy-to-read view and helps find bottlenecks quickly.这是一个很好的、易于阅读的视图,有助于快速找到瓶颈。

For my specific case, the block data objects were provided by an external API that did not have a way to read specific states.对于我的具体情况,块数据对象由无法读取特定状态的外部 API 提供。 Initially, I tried parsing the string representation of the block data, but this turned out to be a terrible idea from a performance perspective.最初,我尝试解析块数据的字符串表示,但从性能的角度来看,这是一个糟糕的想法。 My solution was to write a wrapper class for the block data, QuickBlockData, which had a customised equals and hashCode method.我的解决方案是为块数据 QuickBlockData 编写一个包装器 class,它具有自定义的equalshashCode方法。

The equals method accessed the external API's internal block data instead of the one presented publicly by the API. equals方法访问外部 API 的内部块数据,而不是 API 公开提供的数据。 It only compared state keys that were present in both block data objects.它只比较了两个块数据对象中存在的 state 密钥。

The hashCode method could not utilise any states without breaking the hash code contract, so it just returned the type enum of the block data instead of looking at any states. hashCode方法在不破坏 hash 代码合约的情况下无法利用任何状态,因此它只返回块数据的类型枚举,而不是查看任何状态。

FastUtil快速实用程序

I decided on FastUtil's Object2DoubleOpenHashMap for my value map, since it most closely represented the data structure I wanted to have, and also provided a reasonable performance improvement over Java's own hash map implementations.我决定使用 FastUtil 的Object2DoubleOpenHashMap作为我的值 map,因为它最接近地代表了我想要拥有的数据结构,并且还提供了相对于 Java 自己的 hash Z1D78DC8ED51214E59FE244 实现的合理性能改进。

Instead of using Object2DoubleOpenHashMap directly, I extended in a ValueMap class, which allowed me to run some preemptive checks in the getter, like checking the type of the input against a HashSet of types that are contained in the map.我没有直接使用Object2DoubleOpenHashMap ,而是在ValueMap class 中进行了扩展,这使我可以在 getter 中运行一些抢先检查,例如根据 map 中包含的类型的HashSet检查输入的类型。

Parallelisation and chunks并行化和块

With the same API that provides the block data, it is possible to load 16x16 columns of blocks, chunks, asynchronously, and then parse them individually.使用提供块数据的同一个 API,可以异步加载 16x16 列的块、块,然后单独解析它们。 This was incredibly useful, as it meant I could process each chunk as it was loaded.这非常有用,因为这意味着我可以在加载每个块时对其进行处理。

The API also provided chunk snapshots, which were thread-safe objects that gave some data about the chunk at the time of their creation. API 还提供了块快照,它们是线程安全的对象,在创建时提供了有关块的一些数据。 Critically, they provided a method to find the highest block in a specific block column inside the chunk, which served as an easy way to cut down on the blocks that I need to process.至关重要的是,他们提供了一种在块内的特定块列中查找最高块的方法,这是一种减少我需要处理的块的简单方法。

Summary概括

I used:我用了:

  • A wrapper class for the block data, with a performant equals method that did what I wanted (although it forced me to use a terrible hash code)用于块数据的包装器 class,具有执行我想要的性能 equals 方法(尽管它迫使我使用糟糕的 hash 代码)
  • A ValueMap class, which extended it.unimi.dsi.fastutil.objects.Object2DoubleOpenHashMap, allowing me to get the value for some block data quickly, with a customisable default value, and with preemptive checks for the type of the block.一个 ValueMap class,它扩展了 it.unimi.dsi.fastutil.objects.Object2DoubleOpenHashMap,允许我快速获取一些块数据的值,具有可自定义的默认值,并具有对块类型的抢先检查。
  • Parallel computation and chunking provided by an external API, with an atomic double to count the total value of an area.由外部 API 提供的并行计算和分块,使用原子双精度来计算区域的总值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM