简体   繁体   English

如何正确地将 std::string 转换为整数向量

[英]How to properly convert std::string to an integer vector

My high level goal is to convert any string (can include non-ascii characters) into a vector of integers by converting each character to integer.我的高级目标是通过将每个字符转换为整数来将任何字符串(可以包括非 ascii 字符)转换为整数向量。

I already have a python code snippet for this purpose:为此,我已经有一个 python 代码片段:

bytes = list(text.encode())

Now I want to have a C++ equivalent.现在我想要一个 C++ 等价物。 I tried something like我试过类似的东西

int main() {
  char const* bytes = inputText.c_str();
  long bytesLen = strlen(bytes);
  auto vec = std::vector<long>(bytes, bytes + bytesLen);
  for (auto number : vec) {
      cout << number << endl;
  }
  return 0;
}

For an input string like "testΔ", the python code outputs [116, 101, 115, 116, 206, 148].对于像“testΔ”这样的输入字符串,python 代码输出 [116, 101, 115, 116, 206, 148]。

However C++ code outputs [116, 101, 115, 116, -50, -108].然而 C++ 代码输出 [116, 101, 115, 116, -50, -108]。

How should I change the C++ code to make them consistent?我应该如何更改 C++ 代码以使其一致?

However C++ code outputs [116, 101, 115, 116, -50, -108].然而 C++ 代码输出 [116, 101, 115, 116, -50, -108]。

In C++, the char type is separate from both signed char and unsigned char , and it is unspecified whether or not it should be signed.在 C++ 中, char类型与signed charunsigned char是分开的,并且未指定是否应该有符号。

You thus explicitly want an unsigned char* , but the .c_str method gives you char * , so you need to cast.因此,您明确需要一个unsigned char* ,但.c_str方法为您提供char * ,因此您需要进行转换。 You will need reinterpret_cast or a C-style cast;您将需要reinterpret_cast或 C 风格的强制转换; static_cast will not work . static_cast将不起作用

You can iterate over std::string contents just fine, no need to convert it to std::vector .您可以很好地迭代std::string内容,无需将其转换为std::vector Try this:尝试这个:

int main()
{
    std::string str = "abc";
    for (auto c : str)
    {
        std::cout << static_cast<unsigned int>(c) << std::endl;
    }
}

static_cast here is needed just because standard operator<< outputs char as it is, not as a number.这里需要static_cast只是因为标准operator<<输出char原样,而不是数字。 Otherwise, you can work with it just like with any other integral type.否则,您可以像使用任何其他整数类型一样使用它。 We cast it to unsigned int to ensure that output is strictly positive, for signedness of char is implementation-defined.我们将其强制转换为unsigned int以确保输出严格为正,因为char unsigned int是实现定义的。

How should I change the C++ code to make them consistent?我应该如何更改 C++ 代码以使其一致?

The difference appears to be that Python uses unsigned char values while char is signed in your C++ implementation.不同之处似乎是 Python 使用无符号字符值,而 C++ 实现中的char是有符号的。 One solution: Reinterpret the string as array of unsigned char .一种解决方案:将字符串重新解释为unsigned char数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM