简体   繁体   English

如何编写引用“char *”或“const char*”的 C++ 函数?

[英]How do I write a C++ function that takes a reference to either a 'char *' or a 'const char*'?

I am writing a function that extracts unicode characters from a string one at a time.我正在编写一个函数,一次从一个字符串中提取一个 unicode 字符。 The argument is reference to a pointer to a char which the function increments to the next character prior to returning a value.参数是一个指向 char 的指针的引用,该函数在返回值之前递增到下一个字符。 Here is the entire function:这是整个功能:

uint16_t get_char_and_inc(const char *&c) {
  uint16_t val = *c++;
  if ((val & 0xC0) == 0xC0)
    while ((*c & 0xC0) == 0x80)
      val = (val << 8) | *c++;
  return val;
}

As many have pointed out, this UTF-8 decoder is not technically correct, it is limited to 16-bits codes and it does not remove the encoding bits, but it is sufficient for my limited graphics library for microcontrollers :)正如许多人指出的那样,这个 UTF-8 解码器在技术上并不正确,它仅限于 16 位代码并且它不会删除编码位,但是对于我有限的微控制器图形库来说已经足够了:)

The complexity of this function is irrelevant to the question, so assume it simply is this:这个函数的复杂性与问题无关,所以假设它是这样的:

uint16_t get_utf8_char_and_inc(const char *&c) {
  return *c++;
}

The problem I am having is that I would like it to work for both char * and const char* , ie:我遇到的问题是我希望它同时适用于char *const char* ,即:

void main() {
  const char cc[] = "ab";
  get_char_and_inc(cc);
  printf(cc);
  
  char c[] = "ab";
  get_char_and_inc(c); // This does not compile
  printf(c);
}

Expected output:预期输出:

b
b

However, the second call gives me the error:但是,第二次调用给了我错误:

invalid initialization of non-const reference of type 'const char*&' from an rvalue of type 'const char*'

There are several questions on stackoverflow about this particular error message.关于此特定错误消息的 stackoverflow 上有几个问题。 Usually they regard passing a const char* as a char * , which is illegal.通常他们认为传递const char*char * ,这是非法的。 But in this case, I am going from a char * to a const char* .但在这种情况下,我将从char *转到const char* I feel like this should be legal as I am simply adding a guarantee not to modify the data in the function.我觉得这应该是合法的,因为我只是添加了一个不修改函数中数据的保证。

Reading through other answers, it appears the compiler makes a copy of the pointer, making it into a temporary r-value.通读其他答案,似乎编译器制作了指针的副本,使其成为临时 r 值。 I understand why this may be necessary in non-trivial conversions, but it seems like here it should not be necessary at all.我理解为什么这在非平凡的转换中可能是必要的,但在这里似乎根本不需要。 In fact, if I drop the "&" from the function signature, it compiles just fine, but of course, then the pointers passed by value and the program prints "ab" instead of "b".事实上,如果我从函数签名中删除“&”,它编译得很好,但是当然,然后按值传递的指针和程序打印“ab”而不是“b”。

Currently, to make this work, I have to have the function twice, one taking const char *&c and another taking char *&c .目前,为了完成这项工作,我必须拥有两次该功能,一次采用const char *&c ,另一次采用char *&c This seems inefficient to me as the code is exactly the same.这对我来说似乎效率低下,因为代码完全相同。 Is there any way to avoid the duplication?有没有办法避免重复?

char* and const char* are not the same type, and you can't mix types in a reference, it has to be an exact match. char*const char*不是同一种类型,并且不能在引用中混合类型,它必须是完全匹配的。 That is why you can't pass a char* pointer, or a char[] array, or a const char[] array, etc to a const char*& reference.这就是为什么不能将char*指针、 char[]数组或const char[]数组等传递给const char*&引用的原因。 They simply do not match the type expected.它们只是与预期的类型不匹配。

In this case, to make get_char_and_inc() be a single function that can handles multiple reference types, make it a template function, eg:在这种情况下,要使get_char_and_inc()成为可以处理多种引用类型的单个函数, get_char_and_inc()模板函数,例如:

template<typename T>
uint16_t get_char_and_inc(T* &c) {
  return *c++;
}

int main()
{
  const char *cc = "ab";
  printf("%p\n", cc);
  get_char_and_inc(cc); // deduces T = const char
  printf("%p\n", cc); // shows cc has been incremented
  
  char c[] = "ab";
  char *p = c;
  printf("%p\n", p);
  get_char_and_inc(p); // deduces T = char
  printf("%p\n", p); // shows p has been incremented

  return 0;
}

Online Demo在线演示

If you're worried about the program size you can add a static inline overload like this:如果您担心程序大小,您可以添加一个静态内联重载,如下所示:

uint16_t get_char_and_inc(const char *&c);

static inline uint16_t get_char_and_inc(char *&c) {
    const char *cc = c;
    uint16_t r = get_char_and_inc(cc);
    c = const_cast<char*>(cc);
    return r;
}

Any optimizing compiler worth the title will collapse it down to nothing.任何名副其实的优化编译器都会将其折叠成零。

You could go functional and return a tuple, eg (demonstrating std::get and structured binding):您可以运行并返回一个元组,例如(演示std::get和结构化绑定):

#include <iostream>
#include <tuple>
#include <string.h>

std::tuple<int, char const*> get_char_and_inc(char const* c) {
  int x = static_cast<int>(*c);
  c++;
  return {x, c};
}

int main() {
  char const* cc = "ab";
  auto v1 = get_char_and_inc(cc);
  std::cout << std::get<0>(v1) << ", " <<
               std::get<1>(v1) << "\n";

  char* c = strdup("ab");
  auto [val2, next_c2] = get_char_and_inc(c);
  std::cout << val2 << ", " <<
               next_c2 << "\n";
  free (c);
  return 0;
}

See demo: https://godbolt.org/z/9EWf5zWaj - from there you can see that with -Os the object code is pretty compact (the only real bloat is for std::cout )参见演示: https : //godbolt.org/z/9EWf5zWaj - 从那里你可以看到-Os的目标代码非常紧凑(唯一真正的膨胀是std::cout

The problem is that you are passing the pointer to the string by reference.问题是您通过引用将指针传递给字符串。 You can do it this way but as you found out then you can't mix const char* and char*.你可以这样做,但正如你发现的那样,你不能混合使用 const char* 和 char*。 You can create a const char* call it pCursor and pass that in instead.您可以创建一个 const char* 调用它 pCursor 并将其传入。 I would recommend writing your function like below.我建议像下面这样编写你的函数。 This way you pass a reference to the value and you return a const char* pointer to the next character.通过这种方式,您可以传递对该值的引用,并返回一个指向下一个字符的 const char* 指针。 I would also recommend not incrementing the pointer directly and instead using an index value.我还建议不要直接增加指针,而是使用索引值。

const char* get_char_and_inc(const char* pStr, uint16_t& value)
{
    int currentIndex = 0;

    value = pStr[currentIndex++];

    if ((value & 0xC0) == 0xC0)
    {
        while ((pStr[currentIndex] & 0xC0) == 0x80)
        {
            value = (value << 8) | pStr[currentIndex++];
        }
    }

    return &pStr[currentIndex];
}

Then your main becomes.然后你的主要变成了。

int main()
{
    const char cc[] = "ab";

    uint16_t value;

    const char* pCursor = get_char_and_inc(cc, value);

    printf(pCursor);

    char c[] = "ab";

    pCursor = get_char_and_inc(c, value);

    printf(pCursor);
}

If your don't want to change your get_char_and_inc function then you can change your main to this:如果您不想更改 get_char_and_inc 函数,则可以将 main 更改为:

int main()
{
    const char cc[] = "ab";

    const char* pCursor = cc;

    get_char_and_inc(pCursor);
    printf(pCursor);

    char c[] = "ab";

    pCursor = c;

    get_char_and_inc(pCursor); // This does not compile
    printf(pCursor);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM