Bizzare behavior from C program:: Kernighan & Ritchie exercise 2-3

Question

all.

I've written a program as a solution to Kernighan & Ritchie's exercise 2-3, and its behaviour during testing is (IMHO) wildly unintuitive.

The problem spec says to write a program that converts hex values to their decimal equivalent. The code I've written works fine for smaller hex values, but for larger hex values things get a little... odd. For example, if I input 0x1234 the decimal value 4660 pops out on the other end, which happens to be the correct output (the code also works for letters, ie 0x1FC -> 508 ). If, on the other hand, I were to input a large hex value, say as a specific example 0x123456789ABCDEF , I should get 81985529216486895 , though instead I get 81985529216486896 (off by one digit!).

The error in conversion is inconsistent, sometimes with the decimal value being too high and other times too low. Generally, much larger hex values result in more incorrect place values in the decimal output.

Here's my program in its entirety:

/*Kernighan & Ritchie's Exercise 2-3

Write a function 'htoi' which converts a string of hexadecimal digits (including an 
optional 0x or 0X) into its equivalent integer value.
*/
#include <stdio.h>

#define MAXLINE 1000 //defines maximum size of a hex input

//FUNCTION DEFINITIONS
signed int htoi(char c); //converts a single hex digit to its decimal value

//BEGIN PROGRAM////////////////////////////////////////////////////////////
main()
{
   int i = 0; //counts the length of 'hex' at input
   char c; //character buffer
   char hex[MAXLINE]; //string from input
   int len = 0; //the final value of 'i'
   signed int val; //the decimal value of a character stored in 'hex'
   double n = 0; //the decimal value of 'hex'

   while((c = getchar()) != '\n') //store a string of characters in 'hex'
   {
      hex[i] = c;
      ++i;
   }
   len = i;
   hex[i] = '\0'; //turn 'hex' into a string

   if((hex[0] == '0') && ((hex[1] == 'x') || (hex[1] == 'X'))) //ignore leading '0x'
   {
      for(i = 2; i < len; ++i)
      {
        val = htoi(hex[i]); //call 'htoi'
        if(val == -1 ) //test for a non-hex character
        {
            break;
        }
        n = 16.0 * n + (double)val; //calculate decimal value of hex from hex[0]->hex[i]
      }
   }
   else
   {
      for(i = 0; i < len; ++i)
      {
          val = htoi(hex[i]); //call 'htoi'
          if(val == -1) //test for non-hex character
          {
             break;
          }
          n = 16.0 * n + (double)val; //calc decimal value of hex for hex[0]->hex[i]
      }
   }

 if(val == -1)
 {
    printf("\n!!THE STRING FROM INPUT WAS NOT A HEX VALUE!!\n");
 }
 else
 {
    printf("\n%s converts to %.0f\n", hex, n);
 }

 return 0;
 }

 //FUNCTION DEFINITIONS OUTSIDE OF MAIN()///////////////////////////////////
 signed int htoi(char c)
 {
   signed int val = -1;

   if(c >= '0' && c <= '9')
     val = c - '0';

   else if(c == 'a' || c == 'A')
     val = 10;

   else if(c == 'b' || c == 'B')
     val = 11;

   else if(c == 'c' || c == 'C')
     val = 12;

   else if(c == 'd' || c == 'D')
     val = 13;

   else if(c == 'e' || c == 'E')
     val = 14;

   else if(c == 'f' || c == 'F')
     val = 15;

   else 
   {
     ;//'c' was a non-hex character, do nothing and return -1
   }

   return val;
 }

pastebin: http://pastebin.com/LJFfwSN5

Any ideas on what is going on here?

Answer 1

You are probably exceeding the precision with which double can store integers.

My suggestion would be to change your code to use unsigned long long for the result; and also add in a check for overflow here, eg:

unsigned long long n = 0; 
// ...

if ( n * 16 + val < n )  
{
    fprintf(stderr, "Number too big.\n");
    exit(EXIT_FAILURE);
}

n = n * 16 + val;

My less-than check works because when unsigned integer types overflow they wrap around to zero.

If you want to add more precision than unsigned long long then you will have to get into more advanced techniques (probably beyond the scope of Ch. 2 of K&R but once you've finished the book you could revisit).

NB. You also need to #include <stdlib.h> if you take my suggestion of exit ; and don't forget to change %.0f to %llu in your final printf . Also, a safer way to get the input (which K&R covers) is:

int c;
while((c = getchar()) != '\n' && c != EOF)

The first time I ran the code on ideone I got segfault, because I didn't put a newline on the end of the stdin so this loop kept on shoving EOF into hex until it buffer overflowed.

Answer 2

This is a classic example of floating point inaccuracy.

Unlike most of the examples of floating point errors you'll see, this is clearly not about non-binary fractions or very small numbers; in this case, the floating point representation is approximating very big numbers, with the accuracy decreasing the higher you go. The principle is the same as writing "1.6e10" to mean "approximately 16000000000" (I think I counted the zeros right there!), when the actual number might be 16000000001.

You actually run out of accuracy sooner than with an integer of the same size because only part of the width of a floating point variable can be used to represent a whole number.

Bizzare behavior from C program:: Kernighan & Ritchie exercise 2-3

Question

2 answers

solution1
2 ACCPTED 2014-11-24 22:12:18

solution2
1 2014-11-24 22:21:45

Bizzare behavior from C program:: Kernighan & Ritchie exercise 2-3

Question

2 answers

solution1 2 ACCPTED 2014-11-24 22:12:18

solution2 1 2014-11-24 22:21:45

solution1
2 ACCPTED 2014-11-24 22:12:18

solution2
1 2014-11-24 22:21:45