简体   繁体   English

strcat 向字符串添加垃圾

[英]strcat adds junk to the string

I'm trying to reverse a sentence, without changing the order of words,我试图颠倒一个句子,而不改变单词的顺序,

For example: "Hello World" => "olleH dlroW"例如:"Hello World" => "olleH dlroW"

Here is my code:这是我的代码:

#include <stdio.h>
#include <string.h>

char * reverseWords(const char *text);
char * reverseWord(char *word);

int main () {
  char *text = "Hello World";
  char *result = reverseWords(text);
  char *expected_result = "olleH dlroW";
  printf("%s == %s\n", result, expected_result);
  printf("%d\n", strcmp(result, expected_result));
  return 0;
}

char *
reverseWords (const char *text) {
  // This function takes a string and reverses it words.
  int i, j;
  size_t len = strlen(text);
  size_t text_size = len * sizeof(char);
  // output containst the output or the result
  char *output;

  // temp_word is a temporary variable,
  // it contains each word and it will be
  // empty after each space.
  char *temp_word;

  // temp_char is a temporary variable,
  // it contains the current character
  // within the for loop below.
  char temp_char;

  // allocating memory for output.
  output = (char *) malloc (text_size + 1);

  for(i = 0; i < len; i++) {

    // if the text[i] is space, just append it
    if (text[i] == ' ') {
      output[i] = ' ';
    }

    // if the text[i] is NULL, just get out of the loop
    if (text[i] == '\0') {
      break;
    }

    // allocate memory for the temp_word
    temp_word = (char *) malloc (text_size + 1);

    // set j to 0, so we can iterate only on the word
    j = 0;

    // while text[i + j] is not space or NULL, continue the loop
    while((text[i + j] != ' ') && (text[i + j] != '\0')) {

      // assign and cast test[i+j] to temp_char as a character,
      // (it reads it as string by default)
      temp_char = (char) text[i+j];

      // concat temp_char to the temp_word
      strcat(temp_word, &temp_char); // <= PROBLEM

      // add one to j
      j++;
    }

    // after the loop, concat the reversed version
    // of the word to the output
    strcat(output, reverseWord(temp_word));

    // if text[i+j] is space, concat space to the output
    if (text[i+j] == ' ')
      strcat(output, " ");

    // free the memory allocated for the temp_word
    free(temp_word);

    // add j to i, so u can skip 
    // the character that already read.
    i += j;
  }

  return output;
}

char *
reverseWord (char *word) {
  int i, j;
  size_t len = strlen(word);
  char *output;

  output = (char *) malloc (len + 1);

  j = 0;
  for(i = (len - 1); i >= 0; i--) {
    output[j++] = word[i];
  }

  return output;
}

The problem is the line I marked with <= PROBLEM , On the first word which in this case is "Hello", it does everything just fine.问题是我用<= PROBLEM标记的那一行,在本例中的第一个词是“你好”,它做的一切都很好。

On the second word which in this case is "World", It adds junky characters to the temp_word , I checked it with gdb , temp_char doesn't contain the junk, but when strcat runs, the latest character appended to the temp_word would be something like W\\006 ,在第二个字,在这种情况下是“世界”,它增加了一些假的字符到temp_word ,我检查了它gdbtemp_char不含垃圾,但是当strcat运行,追加到最新的字符temp_word会是这样像W\\006 ,

It appends \\006 to all of the characters within the second word,它将\\006附加到第二个单词中的所有字符,

The output that I see on the terminal is fine, but printing out strcmp and comparting the result with expected_result returns -94 .我在终端上看到的输出很好,但打印出strcmp并将resultexpected_result返回-94

  • What can be the problem?可能是什么问题?
  • What's the \\006 character? \\006字符是什么?
  • Why strcat adds it?为什么strcat添加它?
  • How can I prevent this behavior?我怎样才能防止这种行为?

The root cause of junk characters is you use wrong input for the 2nd argument of strcat function.垃圾字符的根本原因是您对 strcat 函数的第二个参数使用了错误的输入。 see explain below:请参阅下面的解释:

At the beginning of your function you declare:在您的函数开始时,您声明:

  int i, j;
  size_t len = strlen(text);
  size_t text_size = len * sizeof(char);
  // output containst the output or the result
  char *output;

  // temp_word is a temporary variable,
  // it contains each word and it will be
  // empty after each space.
  char *temp_word;

  // temp_char is a temporary variable,
  // it contains the current character
  // within the for loop below.
  char temp_char;

you can print variable's addresses in stack, they will be something like this:您可以在堆栈中打印变量的地址,它们将是这样的:

printf("&temp_char=%p,&temp_word=%p,&output=%p,&text_size=%p\n", &temp_char, &temp_word,&output,&text_size);
result:    
&temp_char=0x7ffeea172a9f,&temp_word=0x7ffeea172aa0,&output=0x7ffeea172aa8,&text_size=0x7ffeea172ab0

As you can see, &temp_char(0x7ffeea172a9f) is at the bottom of the stack, next 1 byte is &temp_word(0x7ffeea172aa0), next 8 bytes is &output(0x7ffeea172aa8), and so on(I used 64bit OS, so it takes 8 bytes for a pointer)如您所见,&temp_char(0x7ffeea172a9f) 位于堆栈底部,接下来的 1 个字节是 &temp_word(0x7ffeea172aa0),接下来的 8 个字节是 &output(0x7ffeea172aa8),依此类推(我使用的是 64 位操作系统,因此需要 8 个字节一个指针)

 // concat temp_char to the temp_word
  strcat(temp_word, &temp_char); // <= PROBLEM

refer strcat description here: http://www.cplusplus.com/reference/cstring/strcat/在这里参考 strcat 描述: http : //www.cplusplus.com/reference/cstring/strcat/

the strcat second argument = &temp_char = 0x7ffeea172a9f. strcat 第二个参数 = &temp_char = 0x7ffeea172a9f。 strcat considers that &temp_char(0x7ffeea172a9f) is the starting point of the source string, instead of adding only one char as you expect it will append to temp_word all characters starting from &temp_char(0x7ffeea172a9f) , until it meets terminating null character strcat 认为 &temp_char(0x7ffeea172a9f) 是源字符串的起始点,而不是像您期望的那样只添加一个字符,它会将所有从 &temp_char(0x7ffeea172a9f) 开始的字符附加到 temp_word ,直到遇到终止空字符

strcat() expects addresses of the 1st character of "C"-strings, which in fact are char -arrays with at least one element being equal to '\\0' . strcat()需要 "C" 字符串的第一个字符的地址,实际上是char数组,其中至少有一个元素等于'\\0'

Neither the memory temp_word points to nor the memory &temp_char points to meet such requirements.无论是内存temp_word点,也不是内存&temp_char点,满足这些要求。

Due to this the infamous undefined behaviour is invoked, anything can happen from then on.因此,调用了臭名昭著的未定义行为,从那时起任何事情都可能发生。

A possible fix would be to change一个可能的解决方法是改变

      temp_word = (char *) malloc (text_size + 1);

to become成为

      temp_word = malloc (text_size + 1); /* Not the issue but the cast is 
                                             just useless in C. */
      temp_word[0] = '\0';

and this和这个

        strcat(temp_word, &temp_char);

to become成为

        strcat(temp_word, (char[2]){temp_char});

There might be other issues with the rest of the code.其余代码可能存在其他问题。

The function strcat deals with strings.函数 strcat 处理字符串。

In this code snippet在这个代码片段中

  // assign and cast test[i+j] to temp_char as a character,
  // (it reads it as string by default)
  temp_char = (char) text[i+j];

  // concat temp_char to the temp_word
  strcat(temp_word, &temp_char); // <= PROBLEM

neither the pointer temp_word nor the pointer &temp_char points to a string.无论是指针temp_word也不指针&temp_char点为字符串。

Moreover array output is not appended with the terminating-zero character for example when the source string consists from blanks.此外,数组output不会附加终止零字符,例如当源字符串由空格组成时。

In any case your approach is too complicated and has many redundant code as for example the condition in the for loop and the condition in the if statement that duplicate each other.在任何情况下,您的方法都太复杂了,并且有许多冗余代码,例如 for 循环中的条件和 if 语句中的条件相互重复。

  for(i = 0; i < len; i++) {

    //…

    // if the text[i] is NULL, just get out of the loop
    if (text[i] == '\0') {
      break;
    }

The function can be written simpler as it is shown in the demonstrative program below.该函数可以写得更简单,如下面的演示程序所示。

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>

char * reverse_words( const char *s )
{
    char *result = malloc( strlen( s ) + 1 );

    if ( result != NULL )
    {
        char *p = result;

        while ( *s != '\0' )
        {
            while ( isblank( ( unsigned char )*s ) )
            {
                *p++ = *s++;
            }


            const char *q = s;

            while ( !isblank( ( unsigned char )*q ) && *q != '\0' ) ++q;

            for ( const char *tmp = q; tmp != s; )
            {
                *p++ = *--tmp;
            }

            s = q;
        }

        *p = '\0';
    }

    return result;
}

int main(void) 
{
    const char *s = "Hello World";

    char *result = reverse_words( s );

    puts( s );
    puts( result );

    free( result );

    return 0;
}

The program output is程序输出是

Hello World
olleH dlroW

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM