简体   繁体   English

R并行写SEXP结构

[英]R parallel write SEXP structure

I am working on a data processing module in R using C/C++ code, mainly for speed reasons. 我正在研究使用C / C ++代码的R中的数据处理模块,主要是出于速度方面的考虑。 Here is a list of facts of my problem. 这是我的问题的事实清单。

  • The final outcome data is a list of string vectors and takes between 20MB to 200MB of memory. 最终结果数据是字符串向量的列表,占用20MB至200MB的内存。
  • The data processing can be fit into single-producer/multiple-consumer model. 数据处理可以适合单生产者/多消费者模型。
  • It takes significant amount of time by wrap to convert vector<vector<string> > to List for my data. 通过wrapvector<vector<string> >转换为我的数据List需要花费大量时间。

Therefore I intend to work directly in SEXP structures, by which I could possibly save the time for the final conversion. 因此,我打算直接在SEXP结构中工作,这样可以节省最终转换的时间。 My main function looks like this. 我的主要功能是这样的。

boost::atomic<bool> done(false);
SEXP myfun(...) {
    ...
    SEXP sdataStr;
    PROTECT(sdataStr=allocVector(VECSXP, nElem));
    vector<SEXP> dataStr(nElem);
    for (int i=0; i<nElem; ++i) {
         dataStr[i]=SET_VECTOR_ELT(sdataStr, i, allocVector(STRSXP, n));
    }
    Producer producer(&queue);
    Consumer consumer1(dataStr, nElem, &queue);
    Consumer consumer2(dataStr, nElem, &queue);

    boost::thread produce(producer);
    boost::thread consume1(consumer1);
    boost::thread consume2(consumer2);

    produce.join();
    done=true;
    consume1.join();
    consume2.join();
    UNPROTECT(1);
    return sdataStr;
}

My consumer class looks like this 我的消费阶层看起来像这样

class Consumer {
    vector<SEXP>& m_dataStr;
    boost::lockfree::queue<buffer>* m_queue;
    buffer m_buffer;

    public:
    Consumer(vector<SEXP>& dataStr, boost::lockfree::queue<buffer>* queue) : m_dataStr(dataStr), m_queue(queue) {}

    void operator()() {
        while (!done) {
            while (m_queue->pop(m_buffer)) {
                process_item();
            }
        }
        while (m_queue->pop(m_buffer)) {
            process_item();
        }
    }

    private:
    process_item() {
        ...
        // for some 0<=idx<nElem, 0<=i<n, some char* f and integer len
        SET_STRING_ELT(m_dataStr[idx], i, mkCharLen(f,len));
        ...
    }
}

These are the only places I use Rinternals. 这些是我唯一使用Rinternals的地方。 The logic of the program ensures that writing to the same place by different threads never happens, ie the idx and i combination in Consumer class can at most occur once. 程序的逻辑确保了永远不会发生由不同线程写入同一位置的情况,即Consumer类中的idxi组合最多只能发生一次。 I encountered various strange problems, such as "stack imbalance", or "snapping into wrong generation", and etc. Is there something I am missing? 我遇到了各种奇怪的问题,例如“堆栈不平衡”或“陷入错误的生成”等。我缺少什么吗? Or calling SET_STRING_ELT in multiple threads is not recommended? 还是不建议在多个线程中调用SET_STRING_ELT? Thank you very much! 非常感谢你!

C/R API functions should not be called in threads unless you know what you are doing, for example mkCharLen might modify the internal hash table that is used for all R strings, so you can't call this in a thread. 除非您知道自己在做什么,否则不应在线程中调用C / R API函数,例如mkCharLen可能会修改用于所有R字符串的内部哈希表,因此您不能在线程中调用此函数。 SET_STRING_ELT is probably also not useable in a thread, especially if the write barrier is on. SET_STRING_ELT在线程中也可能不可用,尤其是在写屏障打开的情况下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM