简体   繁体   English

如何转换向量<wstring>到 wchar_t**?

[英]How to convert a vector<wstring> to a wchar_t**?

I need to create a C compatible (friendly) return type so that my C++ functions can be used to work with C-based functions.我需要创建一个与 C 兼容的(友好的)返回类型,以便我的 C++ 函数可以用于基于 C 的函数。

How I can convert a vector of wstring to a wchar_t** array?如何将wstring向量转换为wchar_t**数组?

您可以遍历 wstring 向量并将每个wstring::c_str()wchart_t**数组。

Far better to avoid doing this at all if you possibly can.如果可能的话,最好完全避免这样做。

If you really have no choice, you'd basically do something like allocating an array of pointers, then allocating space for each string, and copying each individual string in the input to the buffer you allocated.如果你真的别无选择,你基本上会做一些事情,比如分配一个指针数组,然后为每个字符串分配空间,并将输入中的每个单独的字符串复制到你分配的缓冲区中。

wchar_t *dupe_string(std::wstring const &input) { 
    wchar_t *ret = new wchar_t[input.size()+1];
    wcscpy(ret, input.c_str());
    return ret;
}

wchar_t **ruin(std::vector<std::wstring> const &input) {
    wchar_t **trash = new wchar_t*[input.size()];
    for (int i=0; i<input.size(); i++)
       trash[i] = dupe_string(input[i]);
    return trash;
}

Based on the comments, however, I have some misgivings about this applying to the current situation though -- this assumes the input is wide strings, which would typically mean UTF-16 or UTF-32/UCS-4.然而,根据评论,我对这适用于当前情况有一些疑虑——这假设输入是宽字符串,这通常意味着 UTF-16 或 UTF-32/UCS-4。 If the input is really in the form of UTF-8, then the storage elements you're dealing with will really be char , not wchar_t , so your input should be narrow strings ( std::string ) and the matching output char ** rather than wchar_t ** .如果输入真的是 UTF-8 的形式,那么你处理的存储元素真的是char ,而不是wchar_t ,所以你的输入应该是窄字符串( std::string )和匹配的输出char **而不是wchar_t **

wstring is a templated instantiation of basic_string, so its c_str() function returns wchar_t*. wstring 是 basic_string 的模板化实例,因此其 c_str() 函数返回 wchar_t*。

So, you can do something like所以,你可以做类似的事情

std::vector<const wchar_t*> pointers;
pointers.reserve(wstrVec.size());
for (auto it = wstrVec.begin(); it != wstrVec.end(); ++it) {
    pointers.push_back(it->c_str());
}

const whcar_t** cptr = pointers.data();

Without more context it's difficult to advise the best way to deal with scope/lifetime issues.没有更多的上下文,很难建议处理范围/生命周期问题的最佳方法。 Are you writing a library (which suggests you have no control over scope) or providing an api for callbacks from C code you are supervising?您是在编写库(这表明您无法控制范围)还是为您正在监督的 C 代码的回调提供 API?

A common approach is to provide a sizing api so that the caller can provide a destination buffer of appropriate size:一种常见的方法是提供一个 sizing api,以便调用者可以提供适当大小的目标缓冲区:

size_t howManyWstrings()
{
    return wstrVec.size();
}

bool getWstrings(const wchar_t** into, size_t intoSize /*in pointers*/)
{
    const size_t vecSize = wstrVec.size();
    if (intoSize < vecSize || into == nullptr)
        return false;
    for (size_t i = 0; i < vecSize; ++i) {
        into[i] = wstrVec[i].c_str();
    }
    return true;
}

It sounds like your C function is expecting a pointer to a wchar_t buffer, and to be able to move this pointer around.听起来你的 C 函数需要一个指向wchar_t缓冲区的指针,并且能够移动这个指针。

Well, this is mostly easy, though you'll have to manage the lifetime of the pointer.嗯,这很容易,尽管您必须管理指针的生命周期。 To that end, I suggest not doing this as a return type (and thus letting C ruin your API, not to mention your code's sanity), but performing this logic at the call site of the C function:为此,我建议不要将此作为返回类型(从而让 C 破坏您的 API,更不用说您的代码的健全性),而是在 C 函数的调用站点执行此逻辑:

/** A function that produces your vector */
std::vector<wchar_t> foo();

/** The C function in question */
void theCFunction(wchar_t**);

int main()
{
   std::vector<wchar_t> v = foo();
   wchar_t* ptr = &v[0];
   theCFunction(&ptr);
}

BTW from the question and some comments it sounds like you misunderstand what char and wchar_t are — they sit below the encoding layer and if you have UTF-8 then you should be storing each byte of your UTF-8 string as, well, as a single byte.顺便说一句,从问题和一些评论来看,您似乎误解了charwchar_t什么——它们位于编码层下方,如果您有 UTF-8,那么您应该将 UTF-8 字符串的每个字节存储为单字节。 This means using char s, as in a std::string .这意味着使用char s,就像在std::string Sure, each individual byte in that string will not necessarily represent a single logical unicode character, but then that is not the point of it.当然,该字符串中的每个字节不一定代表单个逻辑 unicode 字符,但这不是重点。

This is the function for converting a vector of std::wstring to a wchar_t** based string.这是将std::wstring向量转换为基于wchar_t**的字符串的函数。 It also won't leak any memory because of using that DisposeBuffer();由于使用了 DisposeBuffer(),它也不会泄漏任何内存; call unlike other answers.呼叫不同于其他答案。

wchar_t ** xGramManipulator::GetCConvertedString(vector< wstring> const &input)
{
    DisposeBuffer();  //This is to avoid memory leak for calling this function multiple times
    cStringArraybuffer = new wchar_t*[input.size()]; //cStringArraybuffer is a member variable of type wchar_t**
    for (int i = 0; i < input.size(); i++)
    {
        cStringArraybuffer[i] = new wchar_t[input[i].size()+1];
        wcscpy_s(cStringArraybuffer[i], input[i].size() + 1, input[i].c_str());
        cStringArraySize++;
    }
    return cStringArraybuffer;
}

And this is the DisposeBuffer Helper Function to avoid memory leaks:这是避免内存泄漏的 DisposeBuffer Helper 函数:

void xGramManipulator::DisposeBuffer(void)
{
    for (size_t i = 0; i < cStringArraySize; i++)
    {
        delete [] cStringArraybuffer[i];
    }
    delete [] cStringArraybuffer;
    cStringArraybuffer = nullptr;
    cStringArraySize = 0;
}

And prior to these allocate a dummy space in your constructor:在这些之前,在你的构造函数中分配一个虚拟空间:

xGramManipulator::xGramManipulator()
{
    //allocating dummy array so that when we try to de-allocate it in GetCConvertedString(), dont encounter any undefined behavior

    cStringArraybuffer = new wchar_t*[1];
    cStringArraySize = 0;
    for (int i = 0; i < 1; i++)
    {
        cStringArraybuffer[i] = new wchar_t[1 + 1];
        cStringArraySize++;
    }
}

And it's all done.而这一切都完成了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM