简体   繁体   English

根据 C++ 中的枚举值强制转换 void*

[英]cast void* based on enum value in C++

I am writing a Python Library in C++ using the python C Api. I am writing a Python Library in C++ using the python C Api. There I have about 25 functions, that all accept two strings.我有大约 25 个函数,它们都接受两个字符串。 Since Python might save strings in utf8/16/32 (the moment on char requires a bigger size the whole string will use the bigger size).由于 Python 可能会将字符串保存在 utf8/16/32 中(char 上的时刻需要更大的大小,整个字符串将使用更大的大小)。 When checking which kind the string is you get a enum value between 0 and 4. 0/4 should be handled as utf32, 1 as utf8 and 2 as utf16.在检查字符串是哪种类型时,您会得到一个介于 0 和 4 之间的枚举值。0/4 应作为 utf32 处理,1 作为 utf8 处理,2 作为 utf16 处理。 So I currently have a nested switch for each combination:所以我目前对每个组合都有一个嵌套开关:

The following example shows how the elements are handled in my code.以下示例显示了我的代码中如何处理元素。 random_func is different for each of my functions and is a template, that accepts a string_view of any type. random_func对于我的每个函数都是不同的,并且是一个模板,它接受任何类型的 string_view。 This way to write the code results in about 100 lines of boilerplate for each function that accepts two strings.这种编写代码的方式会为每个接受两个字符串的 function 生成大约 100 行样板代码。

Is there a way to handle all these cases without this immense code duplication and without sacrificing performance?有没有办法在没有大量代码重复和不牺牲性能的情况下处理所有这些情况?

double result = 0;
Py_ssize_t len_s1 = PyUnicode_GET_LENGTH(py_s1);
void* s1 = PyUnicode_DATA(py_s1);

Py_ssize_t len_s2 = PyUnicode_GET_LENGTH(py_s2);
void* s2 = PyUnicode_DATA(py_s2);

int s1_kind = PyUnicode_KIND(py_s1);
int s2_kind = PyUnicode_KIND(py_s2);

switch (s1_kind) {
case PyUnicode_1BYTE_KIND:
    switch (s2_kind) {
    case PyUnicode_1BYTE_KIND:
        result = random_func(
            basic_string_view<char>(static_cast<char*>(s1), len_s1),
            basic_string_view<char>(static_cast<char*>(s2), len_s2));
        break;
    case PyUnicode_2BYTE_KIND:
        result = random_func(
            basic_string_view<char>(static_cast<char*>(s1), len_s1),
            basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
        break;
    default:
        result = random_func(
            basic_string_view<char>(static_cast<char*>(s1), len_s1),
            basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
        break;
    }
    break;
case PyUnicode_2BYTE_KIND:
    switch (s2_kind) {
    case PyUnicode_1BYTE_KIND:
        result = random_func(
            basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
            basic_string_view<char>(static_cast<char*>(s2), len_s2));
        break;
    case PyUnicode_2BYTE_KIND:
        result = random_func(
            basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
            basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
        break;
    default:
        result = random_func(
            basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
            basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
        break;
    }
    break;
default:
    switch (s2_kind) {
    case PyUnicode_1BYTE_KIND:
        result = random_func(
            basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
            basic_string_view<char>(static_cast<char*>(s2), len_s2));
        break;
    case PyUnicode_2BYTE_KIND:
        result = random_func(
            basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
            basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
        break;
    default:
        result = random_func(
            basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
            basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
        break;
    }
    break;
}

Put the complexity away in a function using variants使用变体将复杂性放在 function 中

using python_string_view = std::variant<std::basic_string_view<char>,
    std::basic_string_view<char16_t>,
    std::basic_string_view<char32_t>;

python_string_view decode_python_string(python_string py_str)
{
    Py_ssize_t len_s = PyUnicode_GET_LENGTH(py_str);
    void* s = PyUnicode_DATA(py_str);
    int s_kind = PyUnicode_KIND(py_str);

    switch (s_kind) {
        //return correct string_view here
    }
}

int main()
{
    python_string s1 = ..., s2 = ...;
    auto v1 = decode_python_string(s1);
    auto v2 = decode_python_string(s2);
    std::visit([](auto&& val1, auto&& val2) {
        random_func(val1, val2);
    }, v1, v2);
}

I'm unsure about the performance though.我不确定性能。

For what it is worth:对于它的价值:

The difference it makes to have different char types is at the moment you extract the character values inside random_func (requiring nine template specializations, if I am right).具有不同 char 类型的区别在于您在random_func中提取字符值(如果我是对的话,需要九个模板特化)。

You would be close to a solution by fetching the chars in all cases using the largest type and masking out or shifting out the extra bytes where necessary.通过在所有情况下使用最大类型获取字符并在必要时屏蔽或移出额外字节,您将接近解决方案。 Instead of templating, you would pass a suitable mask and a stride information.您将传递合适的掩码和步幅信息,而不是模板。 Something like就像是

for (char32_t* c= (char32_t*)s1; c &= mask, c != 0; c= (char32_t*)((char*)c + stride))
{
    …
}

Unfortunately, not counting the extra masking operation, you hit a wall because you may have to fetch too many bytes at one end of the string, causing an illegal memory access.不幸的是,如果不计算额外的屏蔽操作,您可能会遇到麻烦,因为您可能必须在字符串的一端获取太多字节,从而导致非法 memory 访问。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM