我該如何處理這個 CP 任務？

Question

任務（來自保加利亞法官，點擊“Език”將其更改為英語）：

我得到了 N 種珊瑚中第一個 (S ₁ = A) 的大小。 每個后續珊瑚的大小（S _i ，其中 i > 1）使用公式 (B*S _i-1 + C)%D 計算，其中 A、B、C 和 D 是一些常數。 我被告知 Nemo 在第 K^個珊瑚附近（當所有珊瑚的大小按升序排序時）。

上述第 K^個珊瑚的大小是多少？

我將進行 T 測試，對於其中的每一個，我都會得到 N、K、A、B、C 和 D，並提示 output 第 K^個珊瑚的大小。

要求：

1≤T≤3

1≤K≤N≤10 ⁷

0 ≤ A < D ≤ 10 ¹⁸

1 ≤ C, B*D ≤ 10 ¹⁸

Memory 可用為64 MB

時間限制為1.9 秒

我遇到的問題：

對於最壞的情況，我將需要 10 ⁷ *8B，即76 MB 。

如果可用的 memory 至少為80 MB ，解決方案將是：

#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>

using biggie = long long;

int main() {
    int t;
    std::cin >> t;
    int i, n, k, j;
    biggie a, b, c, d;
    std::vector<biggie>::iterator it_ans;
    for (i = 0; i != t; ++i) {
        std::cin >> n >> k >> a >> b >> c >> d;
        std::vector<biggie> lut{ a };
        lut.reserve(n);
        for (j = 1; j != n; ++j) {
            lut.emplace_back((b * lut.back() + c) % d);
        }
        it_ans = std::next(lut.begin(), k - 1);
        std::nth_element(lut.begin(), it_ans, lut.end());
        std::cout << *it_ans << '\n';
    }
    return 0;
}

問題 1 ：鑒於上述要求，我如何處理此 CP 任務？

問題 2 ：是否有可能使用std::nth_element來解決它，因為我無法存儲所有 N 個元素？ 我的意思是在滑動 window 技術中使用std::nth_element （如果可能的話）。

@克里斯蒂安·斯洛珀

#include <iostream>
#include <queue>

using biggie = long long;

int main() {
    int t;
    std::cin >> t;
    int i, n, k, j, j_lim;
    biggie a, b, c, d, prev, curr;
    for (i = 0; i != t; ++i) {
        std::cin >> n >> k >> a >> b >> c >> d;
        if (k < n - k + 1) {
            std::priority_queue<biggie, std::vector<biggie>, std::less<biggie>> q;
            q.push(a);
            prev = a;
            for (j = 1; j != k; ++j) {
                curr = (b * prev + c) % d;
                q.push(curr);
                prev = curr;
            }
            for (; j != n; ++j) {
                curr = (b * prev + c) % d;
                if (curr < q.top()) {
                    q.pop();
                    q.push(curr);
                }
                prev = curr;
            }
            std::cout << q.top() << '\n';
        }
        else {
            std::priority_queue<biggie, std::vector<biggie>, std::greater<biggie>> q;
            q.push(a);
            prev = a;
            for (j = 1, j_lim = n - k + 1; j != j_lim; ++j) {
                curr = (b * prev + c) % d;
                q.push(curr);
                prev = curr;
            }
            for (; j != n; ++j) {
                curr = (b * prev + c) % d;
                if (curr > q.top()) {
                    q.pop();
                    q.push(curr);
                }
                prev = curr;
            }
            std::cout << q.top() << '\n';
        }
    }
    return 0;
}

Answer 1

這被接受^{（成功完成所有 40 次測試。最長時間為 1.4 秒，對於 T=3 和 D≤10^9 的測試。對於具有較大 D（因此 T=1）的測試的最長時間為 0.7 秒。）} 。

#include <iostream>

using biggie = long long;

int main() {
    int t;
    std::cin >> t;
    int i, n, k, j;
    biggie a, b, c, d;
    for (i = 0; i != t; ++i) {
        std::cin >> n >> k >> a >> b >> c >> d;
        biggie prefix = 0;
        for (int shift = d > 1000000000 ? 40 : 20; shift >= 0; shift -= 20) {
            biggie prefix_mask = ((biggie(1) << (40 - shift)) - 1) << (shift + 20);
            int count[1 << 20] = {0};
            biggie s = a;
            int rank = 0;
            for (j = 0; j != n; ++j) {
                biggie s_vs_prefix = s & prefix_mask;
                if (s_vs_prefix < prefix)
                    ++rank;
                else if (s_vs_prefix == prefix)
                    ++count[(s >> shift) & ((1 << 20) - 1)];
                s = (b * s + c) % d;
            }
            int i = -1;
            while (rank < k)
                rank += count[++i];
            prefix |= biggie(i) << shift;
        }
        std::cout << prefix << '\n';
    }
    return 0;
}

結果是一個 60 位數字。 我首先通過數字確定高 20 位，然后在另一遍中確定中間 20 位，然后在另一遍中確定低 20 位。

對於高 20 位，生成所有數字並計算每個高 20 位模式出現的頻率。 之后，將計數相加，直到達到 K。達到 K 的模式，該模式涵蓋了第 K 個最大的數字。 換句話說，就是結果的高 20 位。

中間和低 20 位的計算方式類似，只是我們考慮了當時已知的前綴（高 20 位或高 + 中 40 位）。 作為一個小優化，當 D 很小時，我跳過計算高 20 位。 這讓我從 2.1 秒減少到 1.4 秒。

這個解決方案就像 user3386109 描述的，除了桶大小是 2^20 而不是 10^6，所以我可以使用位運算而不是除法，並考慮位模式而不是范圍。

Answer 2

對於您遇到的 memory 約束：

(B*Si-1 + C)%D

只需要其自身之前的值 (Si-2)。 所以你可以成對地計算它們，只使用你需要的總數的 1/2。 這只需要索引偶數並為奇數迭代一次。 所以你可以只使用半長 LUT 並計算飛行中的奇數值。 現代 CPU 的速度足以進行此類額外計算。

std::vector<biggie> lut{ a_i,a_i_2,a_i_4,... };
a_i_3=computeOddFromEven(lut[1]);

您也可以像 4,8 那樣邁出更長的步伐。 如果數據集很大，RAM 延遲會很大。 所以這就像在整個數據搜索空間中設置檢查點以在 memory 和核心使用之間取得平衡。 1000 距離的檢查點會將大量 CPU 周期投入重新計算，但隨后陣列將適合 CPU 的 L2/L1 緩存，這還不錯。 排序時，每個元素的最大重新計算迭代次數現在為 n=1000。 O(1000 x size) 也許它是一個很大的常量，但如果某些常量真的是const ，編譯器可能會以某種方式對其進行優化？

如果 CPU 性能再次成為問題：

編寫一個編譯 function 將您的源代碼與用戶給字符串的所有“常量”一起編寫
使用命令行編譯代碼（假設目標計算機可以從命令行訪問一些代碼，例如主程序中的 g++）
運行它並獲得結果

當這些在編譯時真正保持不變而不是依賴於std::cin時，編譯器應該啟用更多的速度/內存優化。

如果您確實需要對 RAM 使用量添加硬限制，則使用后備存儲實現一個簡單的緩存作為您的繁重計算，使用蠻力 O(N^2)（或 O(L x N)，每個檢查點L 元素與第一種方法相同，其中 L=2 或 4，或...）。

這是一個具有 8M long-long 值空間的直接映射緩存示例：

int main()
{
    std::vector<long long> checkpoints = { 
           a_0, a_16, a_32,...
    };
    auto cacheReadMissFunction = [&](int key){
        // your pure computational algorithm here, helper meant to show variable 
        long long result = checkpoints[key/16];  
        for(key - key%16 times)
            result = iterate(result);
        return result;
    };
    auto cacheWriteMissFunction = [&](int key, long long value){
        /* not useful for your algorithm as it doesn't change behavior per element */
        // backing_store[key] = value;
    };    

    // due to special optimizations, size has to be 2^k
    int cacheSize = 1024*1024*8;
    DirectMappedCache<int, long long> cache(cacheSize,cacheReadMissFunction,cacheWriteMissFunction);
    std::cout << cache.get(20)<<std::endl;
    return 0;
}

如果您使用緩存友好的排序算法，直接緩存訪問將使比較中的幾乎所有元素大量重復使用，如果您通過像雙音一樣的方式將元素一個一個地填充到 output 緩沖區/終端 -排序路徑（在編譯時已知）。 如果這不起作用，那么您可以嘗試訪問文件作為緩存的“后備存儲”，以便一次對整個數組進行排序。 是否禁止使用文件系統？ 那么上面的在線編譯方式也不行。

直接映射緩存的實現（如果您使用任何cache.set()方法，請不要忘記在您的算法完成后調用flush() ）：

#ifndef DIRECTMAPPEDCACHE_H_
#define DIRECTMAPPEDCACHE_H_

#include<vector>
#include<functional>
#include<mutex>
#include<iostream>

/* Direct-mapped cache implementation
 * Only usable for integer type keys in range [0,maxPositive-1]
 *
 * CacheKey: type of key (only integers: int, char, size_t)
 * CacheValue: type of value that is bound to key (same as above)
 */
template<   typename CacheKey, typename CacheValue>
class DirectMappedCache
{
public:
    // allocates buffers for numElements number of cache slots/lanes
    // readMiss:    cache-miss for read operations. User needs to give this function
    //              to let the cache automatically get data from backing-store
    //              example: [&](MyClass key){ return redis.get(key); }
    //              takes a CacheKey as key, returns CacheValue as value
    // writeMiss:   cache-miss for write operations. User needs to give this function
    //              to let the cache automatically set data to backing-store
    //              example: [&](MyClass key, MyAnotherClass value){ redis.set(key,value); }
    //              takes a CacheKey as key and CacheValue as value
    // numElements: has to be integer-power of 2 (e.g. 2,4,8,16,...)
    DirectMappedCache(CacheKey numElements,
                const std::function<CacheValue(CacheKey)> & readMiss,
                const std::function<void(CacheKey,CacheValue)> & writeMiss):size(numElements),sizeM1(numElements-1),loadData(readMiss),saveData(writeMiss)
    {
        // initialize buffers
        for(size_t i=0;i<numElements;i++)
        {
            valueBuffer.push_back(CacheValue());
            isEditedBuffer.push_back(0);
            keyBuffer.push_back(CacheKey()-1);// mapping of 0+ allowed
        }
    }



    // get element from cache
    // if cache doesn't find it in buffers,
    // then cache gets data from backing-store
    // then returns the result to user
    // then cache is available from RAM on next get/set access with same key
    inline
    const CacheValue get(const CacheKey & key)  noexcept
    {
        return accessDirect(key,nullptr);
    }

    // only syntactic difference
    inline
    const std::vector<CacheValue> getMultiple(const std::vector<CacheKey> & key)  noexcept
    {
        const int n = key.size();
        std::vector<CacheValue> result(n);

        for(int i=0;i<n;i++)
        {
            result[i]=accessDirect(key[i],nullptr);
        }
        return result;
    }


    // thread-safe but slower version of get()
    inline
    const CacheValue getThreadSafe(const CacheKey & key)  noexcept
    {
        std::lock_guard<std::mutex> lg(mut);
        return accessDirect(key,nullptr);
    }

    // set element to cache
    // if cache doesn't find it in buffers,
    // then cache sets data on just cache
    // writing to backing-store only happens when
    //                  another access evicts the cache slot containing this key/value
    //                  or when cache is flushed by flush() method
    // then returns the given value back
    // then cache is available from RAM on next get/set access with same key
    inline
    void set(const CacheKey & key, const CacheValue & val) noexcept
    {
        accessDirect(key,&val,1);
    }

    // thread-safe but slower version of set()
    inline
    void setThreadSafe(const CacheKey & key, const CacheValue & val)  noexcept
    {
        std::lock_guard<std::mutex> lg(mut);
        accessDirect(key,&val,1);
    }

    // use this before closing the backing-store to store the latest bits of data
    void flush()
    {
        try
        {
            std::lock_guard<std::mutex> lg(mut);
            for (size_t i=0;i<size;i++)
            {
                if (isEditedBuffer[i] == 1)
                {
                    isEditedBuffer[i]=0;
                    auto oldKey = keyBuffer[i];
                    auto oldValue = valueBuffer[i];
                    saveData(oldKey,oldValue);
                }
            }
        }catch(std::exception &ex){ std::cout<<ex.what()<<std::endl; }
    }

    // direct mapped access
    // opType=0: get
    // opType=1: set
    CacheValue const accessDirect(const CacheKey & key,const CacheValue * value, const bool opType = 0)
    {

        // find tag mapped to the key
        CacheKey tag = key & sizeM1;

        // compare keys
        if(keyBuffer[tag] == key)
        {
            // cache-hit

            // "set"
            if(opType == 1)
            {
                isEditedBuffer[tag]=1;
                valueBuffer[tag]=*value;
            }

            // cache hit value
            return valueBuffer[tag];
        }
        else // cache-miss
        {
            CacheValue oldValue = valueBuffer[tag];
            CacheKey oldKey = keyBuffer[tag];

            // eviction algorithm start
            if(isEditedBuffer[tag] == 1)
            {
                // if it is "get"
                if(opType==0)
                {
                    isEditedBuffer[tag]=0;
                }

                saveData(oldKey,oldValue);

                // "get"
                if(opType==0)
                {
                    const CacheValue && loadedData = loadData(key);
                    valueBuffer[tag]=loadedData;
                    keyBuffer[tag]=key;
                    return loadedData;
                }
                else /* "set" */
                {
                    valueBuffer[tag]=*value;
                    keyBuffer[tag]=key;
                    return *value;
                }
            }
            else // not edited
            {
                // "set"
                if(opType == 1)
                {
                    isEditedBuffer[tag]=1;
                }

                // "get"
                if(opType == 0)
                {
                    const CacheValue && loadedData = loadData(key);
                    valueBuffer[tag]=loadedData;
                    keyBuffer[tag]=key;
                    return loadedData;
                }
                else // "set"
                {
                    valueBuffer[tag]=*value;
                    keyBuffer[tag]=key;
                    return *value;
                }
            }

        }
    }


private:
    const CacheKey size;
    const CacheKey sizeM1;
    std::mutex mut;

    std::vector<CacheValue> valueBuffer;
    std::vector<unsigned char> isEditedBuffer;
    std::vector<CacheKey> keyBuffer;
    const std::function<CacheValue(CacheKey)>  loadData;
    const std::function<void(CacheKey,CacheValue)>  saveData;

};


#endif /* DIRECTMAPPEDCACHE_H_ */

Answer 3

您可以使用最大堆解決此問題。

將前 k 個元素插入最大堆。 這些 k 中最大的元素現在將位於根部。
對於每個剩余元素e ：
將e與root進行比較。
如果e大於根，則丟棄它。 如果e小於root ，則刪除root並將e插入堆結構。
處理完所有元素后，第 k 個最小的元素位於root中。

該方法使用O(K)空間和O(n log n)時間。

Answer 4

有一種人們經常稱之為 LazySelect 的算法，我認為它在這里是完美的。

我們很有可能會進行兩次通過。 在第一遍中，我們保存了一個大小為 n 的隨機樣本，其大小遠小於 N。答案將在排序樣本中的索引 (K/N)n 附近，但由於隨機性，我們必須小心。 將值 a 和 b 保存在 (K/N)n ± r，其中 r 是 window 的半徑。在第二遍中，我們將所有值保存在 [a, b] 中，計算值的數量減去比 a（讓它是 L），和 select 索引 K−L 的值，如果它在 window 中（否則，再試一次）。

選擇 n 和 r 的理論建議很好，但我在這里會很務實。 選擇 n 以便您使用大部分可用的 memory； 樣本越大，信息越多。 選擇 r 也相當大，但由於隨機性，沒有那么激進。

C++ 下面的代碼。 在在線判斷上，它比凱利的快（T=3 測試最多 1.3 秒，T=1 測試最多 0.5 秒）。

#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <limits>
#include <optional>
#include <random>
#include <vector>

namespace {

class LazySelector {
public:
  static constexpr std::int32_t kTargetSampleSize = 1000;

  explicit LazySelector() { sample_.reserve(1000000); }

  void BeginFirstPass(const std::int32_t n, const std::int32_t k) {
    sample_.clear();
    mask_ = n / kTargetSampleSize;
    mask_ |= mask_ >> 1;
    mask_ |= mask_ >> 2;
    mask_ |= mask_ >> 4;
    mask_ |= mask_ >> 8;
    mask_ |= mask_ >> 16;
  }

  void FirstPass(const std::int64_t value) {
    if ((gen_() & mask_) == 0) {
      sample_.push_back(value);
    }
  }

  void BeginSecondPass(const std::int32_t n, const std::int32_t k) {
    sample_.push_back(std::numeric_limits<std::int64_t>::min());
    sample_.push_back(std::numeric_limits<std::int64_t>::max());
    const double p = static_cast<double>(sample_.size()) / n;
    const double radius = 2 * std::sqrt(sample_.size());
    const auto lower =
        sample_.begin() + std::clamp<std::int32_t>(std::floor(p * k - radius),
                                                   0, sample_.size() - 1);
    const auto upper =
        sample_.begin() + std::clamp<std::int32_t>(std::ceil(p * k + radius), 0,
                                                   sample_.size() - 1);
    std::nth_element(sample_.begin(), upper, sample_.end());
    std::nth_element(sample_.begin(), lower, upper);
    lower_ = *lower;
    upper_ = *upper;
    sample_.clear();
    less_than_lower_ = 0;
    equal_to_lower_ = 0;
    equal_to_upper_ = 0;
  }

  void SecondPass(const std::int64_t value) {
    if (value < lower_) {
      ++less_than_lower_;
    } else if (upper_ < value) {
    } else if (value == lower_) {
      ++equal_to_lower_;
    } else if (value == upper_) {
      ++equal_to_upper_;
    } else {
      sample_.push_back(value);
    }
  }

  std::optional<std::int64_t> Select(std::int32_t k) {
    if (k < less_than_lower_) {
      return std::nullopt;
    }
    k -= less_than_lower_;
    if (k < equal_to_lower_) {
      return lower_;
    }
    k -= equal_to_lower_;
    if (k < sample_.size()) {
      const auto kth = sample_.begin() + k;
      std::nth_element(sample_.begin(), kth, sample_.end());
      return *kth;
    }
    k -= sample_.size();
    if (k < equal_to_upper_) {
      return upper_;
    }
    return std::nullopt;
  }

private:
  std::default_random_engine gen_;
  std::vector<std::int64_t> sample_ = {};
  std::int32_t mask_ = 0;
  std::int64_t lower_ = std::numeric_limits<std::int64_t>::min();
  std::int64_t upper_ = std::numeric_limits<std::int64_t>::max();
  std::int32_t less_than_lower_ = 0;
  std::int32_t equal_to_lower_ = 0;
  std::int32_t equal_to_upper_ = 0;
};

} // namespace

int main() {
  int t;
  std::cin >> t;
  for (int i = t; i > 0; --i) {
    std::int32_t n;
    std::int32_t k;
    std::int64_t a;
    std::int64_t b;
    std::int64_t c;
    std::int64_t d;
    std::cin >> n >> k >> a >> b >> c >> d;
    std::optional<std::int64_t> ans = std::nullopt;
    LazySelector selector;
    do {
      {
        selector.BeginFirstPass(n, k);
        std::int64_t s = a;
        for (std::int32_t j = n; j > 0; --j) {
          selector.FirstPass(s);
          s = (b * s + c) % d;
        }
      }
      {
        selector.BeginSecondPass(n, k);
        std::int64_t s = a;
        for (std::int32_t j = n; j > 0; --j) {
          selector.SecondPass(s);
          s = (b * s + c) % d;
        }
      }
      ans = selector.Select(k - 1);
    } while (!ans);
    std::cout << *ans << '\n';
  }
}

我該如何處理這個 CP 任務？

問題描述

4 個解決方案

解決方案1
3 已采納 2023-01-28 03:31:48

解決方案2
2 2023-01-27 06:44:29

解決方案3
1 2023-01-27 07:14:40

解決方案4
1 2023-01-27 13:08:20

我該如何處理這個 CP 任務？

問題描述

4 個解決方案

解決方案1 3 已采納 2023-01-28 03:31:48

解決方案2 2 2023-01-27 06:44:29

解決方案3 1 2023-01-27 07:14:40

解決方案4 1 2023-01-27 13:08:20

解決方案1
3 已采納 2023-01-28 03:31:48

解決方案2
2 2023-01-27 06:44:29

解決方案3
1 2023-01-27 07:14:40

解決方案4
1 2023-01-27 13:08:20