简体   繁体   中英

How to separate/decode binary string from an input file? (C)

My program manages to read in an input text file line by line, and now I'm trying to process a specific section. The input looks like this:

^Other stuff^
CODE
10001100001000100000000000000000
00000000010000110010000000100000
10101100001001000000000000000000
(End of file after CODE section)

The binary numbers is a MIPS instruction, and there can be any amount of lines (I used 3 as an example). I need to separate each 32 bit line into 6 sections: OpCode, rs, rt, rd, shamt, funct (6, 5, 5, 5, 5, 6 bits respectively).

What I've already done: My program can already read the input text and call the proper function for each section. Here is a small snippet for context:

char line[64];

int mode = 0; //for parsing sections, 1 = REGISTER, 2 = MEMORY, 3 = CODE

while (fgets(line, sizeof line, filePtr) != NULL) {
    if (strcmp(line, "REGISTERS\n") == 0) {
        mode = 1;
    }

    else if (strcmp(line, "MEMORY\n") == 0) {
        mode = 2;
    }

    else if (strcmp(line, "CODE\n") == 0) {
        mode = 3;
    }

    switch (mode) {
    case 1:
        parseRegisters(line, registers); //this part works
        break;
    case 2:
        parseMemory(line, memory);  //this part works
        break;
    case 3:
        parseCode(line);  //have not started writing this function
        break;
    default:
        break;
    }

So at this section, it starts calling parseCode() for each binary line. How do I write parseCode() to separate each line into its 6 sections? I then need to perform different actions depending on what the sections decode to, but I already have that covered.

A string of zeros ( 0 ) and ones ( 1 ) can be converted into a 32 bit number using strtoul :

char *end;
uint32_t value = strtoul(line, &end, 2);

You can use end to make sure the entire line was properly consumed. The third argument specifies the base to use for the conversion.

Once you have the value, you can easily mask out the relevant portions of the number using regular bit-masking logic. Where these bits actually reside is platform dependent, but assuming little endian:

#define OPCODE_MASK (0x3FUL <<  0)
#define RS_MASK     (0x1FUL <<  6)
#define RT_MASK     (0x1FUL << 11)
#define RD_MASK     (0x1FUL << 16)
#define SHAMT_MASK  (0x1FUL << 21)
#define FUNCT_MASK  (0x3FUL << 26)

#define GET_OPCODE(X) (((X) & OPCODE_MASK) >>  0)
#define GET_RS(X)     (((X) & RS_MASK)     >>  6)
#define GET_RT(X)     (((X) & RT_MASK)     >> 11)
#define GET_RD(X)     (((X) & RD_MASK)     >> 16)
#define GET_SHAMT(X)  (((X) & SHAMT_MASK)  >> 21)
#define GET_FUNCT(X)  (((X) & FUNCT_MASK)  >> 26)

Alternatively, if you don't mind platform specific behavior, you can attempt to unpack the value via a structure with bit-fields.

union decode {
    uint32_t code;
    struct {
        uint32_t opcode : 6;
        uint32_t rs     : 5;
        uint32_t rt     : 5;
        uint32_t rd     : 5;
        uint32_t shamt  : 5;
        uint32_t funct  : 6;
    };
};

DEMO

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM