I was trying to make a text parser which separates words in a string based on the space character. However, something is going wrong.
#include <stdio.h>
#include <string.h>
int main() {
//the string should end with a space to count the all words
char name[30] = "hello world from jordan ";
int start = 0;
int end = strlen(name);
int end_word = start;
char full[20][20];
memset(full, 0, 400);
int number_of_words = 0;
for (int w = 0; w < end; w++) {
if (name[w] == ' ') {
number_of_words++;
}
}
int counter = 0;
while (counter < number_of_words) {
for (int i = start; i < end; i++) {
if (name[i] == ' ') {
start = i;
break;
}
}
for (int j = end_word; j < start; j++) {
full[counter][j] = name[j];
}
end_word = start;
start++;
counter++;
}
for (int x = 0; x < 20; x++) {
for (int y = 0; y < 20; y++) {
printf("%c", full[x][y]);
}
printf("%d", x);
}
return 0;
}
here is the strange thing happening when I run the code:
hello0 world1 from2 jor3dan45678910111213141516171819
the first three words are being initialized in the right way but the fourth is not and I don't know why this is happening.
I want an explanation for the problem and if possible I want a more efficient way of writing this code without pointers pointers .
Note: I'm a beginner that's why I'm asking for a solution without pointers.
To start, trying to avoid pointers in C will be (very) hard. Just by their nature, arrays become pointers the instant you want to do anything useful with them. Array subscription is syntactic sugar over pointer arithmetic ( foo[2]
is the same as *(foo + 2)
). Passing an array to a function will cause it to decay to a pointer to the first element.
You use pointers several times in your code, whether you realise it or not.
As for the code...
Quick note: size_t
, not int
, is the appropriate type to use when working with memory sizes / indexing. I'll be using it in the "corrected" versions of the code, and you should try to use it in general, moving forward.
The output is a bit confusing because everything is printed on a single line. Let's clean that up, and add some debugging information, like the length of each string you've stored.
for (size_t x = 0; x < 20; x++) {
printf("%zu [length: %zu]: ", x, strlen(full[x]));
for (size_t y = 0; y < 20; y++)
printf("%c", full[x][y]);
putchar('\n');
}
Now we get the output, across several lines (some duplicates collapsed for brevity), of:
0 [length: 5]: hello
1 [length: 0]: world
2 [length: 0]: from
3 [length: 0]: jor
4 [length: 3]: dan
5 [length: 0]:
...
19 [length: 0]:
From this we can see a few notable things.
0
, and would seem to include spaces. The zero lengths mean some of our arrays are starting with the null-terminating byte ( '\0'
), and we are only seeing output because we manually walk the entirety of each array.
Note that most terminals will do "nothing" when a null character is to be printed, meaning we appear to skip directly to our "strings". We can better visualize what is happening by always printing something:
printf("%c", full[x][y] ? full[x][y] : '*');
In this case we print an asterisk when we encounter a null character, giving us the output:
0 [length: 5]: hello***************
1 [length: 0]: ***** world*********
2 [length: 0]: *********** from****
3 [length: 0]: **************** jor
4 [length: 3]: dan*****************
5 [length: 0]: ********************
...
19 [length: 0]: ********************
This very clearly shows where in memory our characters have been placed.
The primary issue is that in this loop
for (int j = end_word; j < start; j++) {
full[counter][j] = name[j];
}
j
is initialized to a position relative to the beginning of name
, but is used to index a memory offset of full
. Excluding our first substring, when end_word
is 0
, this puts us farther and farther away from the zeroth index of each subarray, eventually crossing the borders between arrays.
This happens to work because 2D arrays in C are laid out contiguously in memory.
To fix this, we must copy our characters using a separate index that starts at zero for each subarray.
for (size_t j = end_word, k = 0; j < start; j++, k++) {
full[counter][k] = name[j];
}
Now when we print our arrays out we can limit ourselves to our known number_of_words
( for (size_t x = 0; x < number_of_words; x++)
), giving us the output:
0 [length: 5]: hello***************
1 [length: 6]: world**************
2 [length: 5]: from***************
3 [length: 7]: jordan*************
This looks roughly correct, but includes the preceding space in the "word". We can skip past these spaces by setting end_word
to the next character instead:
start++;
end_word = start;
counter++;
Now our output looks properly split:
0 [length: 5]: hello***************
1 [length: 5]: world***************
2 [length: 4]: from****************
3 [length: 6]: jordan**************
Note that these are (now properly formatted) null-terminated strings , and could be printed using the %s
specifier, as in:
for (size_t x = 0; x < number_of_words; x++)
printf("%zu [length: %zu]: %s\n", x, strlen(full[x]), full[x]);
Overall this is a bit fragile, as it requires the trailing delimiting space in order to work, and will create an empty string each time a delimiting space is repeated (or if the source string starts with a space).
As an aside, this similar example should showcase a straight-forward method for tokenizing a string, while skipping over all delimiters, and includes some important annotations.
#include <stdio.h>
#include <string.h>
int main(void) {
char name[30] = "hello world from jordan";
char copies[20][30] = { 0 };
size_t length_of_copies = 0;
size_t hold_position = 0;
size_t substring_span = 0;
size_t i = 0;
do {
/* our substring delimiters */
if (name[i] == ' ' || name[i] == '\0') {
/* only copy non-zero spans of non-delimiters */
if (substring_span) {
/* `strncpy` will not insert a null terminating character
* into the destination if it is not found within the span
* of characters of the source string...
*/
strncpy(
copies[length_of_copies],
name + hold_position,
substring_span
);
/* ...so we must manually insert a null terminating character
* (or otherwise rely on our memory being initialized to all-zeroes)
* */
copies[length_of_copies++][substring_span] = '\0';
substring_span = 0;
}
/* let's assume our next position will be the start of a substring */
hold_position = i + 1;
} else
substring_span++;
/* checking our character at the end of the loop,
* and incrementing after the fact,
* let's us include the null terminating character as a delimiter,
* as we will only fail to enter the loop after processing it
*/
} while (name[i++] != '\0');
for (size_t i = 0; i < length_of_copies; i++)
printf("%zu: [%s]\n", i + 1, copies[i]);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.