I'm working on an assembler for a hypothetical machine (the SMAC-0 machine) and need some help with memory allocation.
I'll be getting and tokenizing strings from a given file and will save these tokens in pointers.
Here's a code snippet:
tokenCount = sscanf(buffer,"%s %s %s %s", tokenOne, tokenTwo, tokenThree, tokenFour);
where tokenCount
is an integer, buffer
is the temporary buffer that stores the line taken from the input file, and tokenOne
, tokenTwo
, tokenThree
, and tokenFour
are character pointers.
The strings accepted from the file can have one to four words:
Example:
READ N
N: DS 1
SUM: DS 1
LOOP: MOVER AREG N
ADD AREG N
COMP AREG ='5'
BC LE LOOP
MOVEM AREG SUM
PRINT SUM
STOP
My queries are:
(That question also applies to the buffer
pointer, since the labels (eg LOOP
, N
, SUM
) can be of variable sizes.)
scanf()
or other input functions like gets()
, do the same? You should declare your token buffers large enough. To be on the safe side, it's a good idea to make all of them as large as the input buffer itself. See this this thread How to prevent scanf causing a buffer overflow in C? for more information.
If you're using the GNU compiler, you can make use a extension which can dynamically allocate buffers on your behalf. Check out Dynamic allocation with scanf()
Using predefined buffers for the scanned tokens:
Note all tokens have the same size as the input buffer:
/* sscanf-test.c */
#include <stdio.h>
int main(int argc, char** argv)
{
FILE *file = fopen("sample.txt", "r");
const int BufferSize=256;
char buffer[BufferSize];
char tokenOne[BufferSize];
char tokenTwo[BufferSize];
char tokenThree[BufferSize];
char tokenFour[BufferSize];
while (fgets(buffer, sizeof(buffer), file) != NULL)
{
tokenOne[0]='\0';
tokenTwo[0]='\0';
tokenThree[0]='\0';
tokenFour[0]='\0';
int tokenCount = sscanf(buffer, "%s %s %s %s", tokenOne, tokenTwo, tokenThree, tokenFour);
printf("scanned %d tokens 1:%s 2:%s 3:%s 4:%s\n", tokenCount, tokenOne, tokenTwo, tokenThree, tokenFour);
}
fclose(file);
return 0;
}
The program produces the following output (I cleaned up the formatting a little bit to improve readability):
gcc sscanf-test.c -o sscanf-test ./sscanf-test scanned 2 tokens 1:READ 2:N 3: 4: scanned 3 tokens 1:N: 2:DS 3:1 4: scanned 3 tokens 1:SUM: 2:DS 3:1 4: scanned 4 tokens 1:LOOP: 2:MOVER 3:AREG 4:N scanned 3 tokens 1:ADD 2:AREG 3:N 4: scanned 3 tokens 1:COMP 2:AREG 3:='5' 4: scanned 3 tokens 1:BC 2:LE 3:LOOP 4: scanned 3 tokens 1:MOVEM 2:AREG 3:SUM 4: scanned 2 tokens 1:PRINT 2:SUM 3: 4: scanned 1 tokens 1:STOP 2: 3: 4:
If you want to store the scanned tokens for later processing, you'll have to copy them somewhere else in the while-loop. You can use the function strlen
to get the size of the token (excluding the trailing string terminator '\\0').
Using dynamic memory allocation for tokens:
Like I said, you could also let scanf allocate buffers for you dynamically. The scanf(3) man page states that you can use GNU extensions 'a' or 'm' to do that. Specifically it says:
An optional 'a' character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required. This is a GNU extension; C99 employs the 'a' character as a conversion specifier (and it can also be used as such in the GNU implementation)
I couldn't get scanf to work using the 'a' modifier. However, there's also the 'm' modifier which does the same thing (and more):
Since version 2.7, glibc also provides the m modifier for the same purpose as the a modifier. The m modifier has the following advantages:
It may also be applied to %c conversion specifiers (eg, %3mc).
It avoids ambiguity with respect to the %a floating-point conversion specifier (and is unaffected by gcc -std=c99 etc.)
It is specified in the upcoming revision of the POSIX.1 standard.
/* sscanf-alloc.c */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *file = fopen("sample.txt", "r");
const int BufferSize=64;
char buffer[BufferSize];
char *tokenOne = NULL;
char *tokenTwo = NULL;
char *tokenThree = NULL;
char *tokenFour = NULL;
while (fgets(buffer, sizeof(buffer), file) != NULL)
{
// note: the '&', scanf requires pointers to pointer to allocate the buffers.
int tokenCount = sscanf(buffer, "%ms %ms %ms %ms", &tokenOne, &tokenTwo, &tokenThree, &tokenFour);
printf("scanned %d tokens 1:%s 2:%s 3:%s 4:%s\n", tokenCount, tokenOne, tokenTwo, tokenThree, tokenFour);
// note: the memory has to be free'd to avoid leaks
free(tokenOne);
free(tokenTwo);
free(tokenThree);
free(tokenFour);
tokenOne = NULL;
tokenTwo = NULL;
tokenThree = NULL;
tokenFour = NULL;
}
fclose(file);
return 0;
}
gcc sscanf-alloc.c -o sscanf-alloc ./sscanf-alloc scanned 2 tokens 1:READ 2:N 3:(null) 4:(null) scanned 3 tokens 1:N: 2:DS 3:1 4:(null) scanned 3 tokens 1:SUM: 2:DS 3:1 4:(null) scanned 4 tokens 1:LOOP: 2:MOVER 3:AREG 4:N scanned 3 tokens 1:ADD 2:AREG 3:N 4:(null) scanned 3 tokens 1:COMP 2:AREG 3:='5' 4:(null) scanned 3 tokens 1:BC 2:LE 3:LOOP 4:(null) scanned 3 tokens 1:MOVEM 2:AREG 3:SUM 4:(null) scanned 2 tokens 1:PRINT 2:SUM 3:(null) 4:(null) scanned 1 tokens 1:STOP 2:(null) 3:(null) 4:(null)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.