CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to CCS Technical Support

IEEE 754 half-precision binary floating point

 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
Wi1l



Joined: 03 Oct 2020
Posts: 3

View user's profile Send private message

IEEE 754 half-precision binary floating point
PostPosted: Sat Oct 03, 2020 8:10 pm     Reply with quote

Hi, someone can tell me how to convert a base 10 decimal number to 16 bit half-precision IEEE 754 binary floating point ? I have tried to use the IEEEFloat driver but this only works for 32 bits.
Ttelmah



Joined: 11 Mar 2010
Posts: 19552

View user's profile Send private message

PostPosted: Sun Oct 04, 2020 12:32 am     Reply with quote

Ouch. binary16. Probably the easiest way is going to be to extract the
parts from the standard float32 format.
Question. What PIC family are you using?. (PIC16/18 or a PIC24/30/33).
Do you need to handle conversions both ways?.

binary16, has a 10bit fractional part, and a 5bit exponent with sign.
So, provided the number is inside the range this supports, you can
just extract the components from a standard 32bit float:
Code:

typedef unsigned int16 float16;
union {
   float32 fp;
   unsigned int8 bytes[4];
   unsigned int16 words[2]
   unsigned int32 longword;
} combiner;

float16 f32tof16(union combiner val)
{
    int1 sign;
    unsigned int32 fraction;
    unsigned int32 exponent;
    float16 result;
    //OK, we arrive with a standard float32 value. If we are using PCD
    //this is in IEEE format, otherwise Microchip. Using the union allows
    //access to the bytes in this
#if defined(__PCD__)
   //Here code for PCD compiler
   //Now need to extract the parts from the value.
   exponent = val.longword & 0x7F800000ul; //8bits
   sign=bit_test(val.longword,31); //top bit is sign
#else
   //Here we are on a PIC 16/18, so data is in different positions
   exponent = (val.longword & 0xFF000000ul)>>1; //8 bit exponent
   sign=bit_test(val.longword,23); //bit 23 is sign
#endif
   fraction = val.longword & 0x007FFFFFul; //low 23 bits
   //Now really should test for zero and infinity here
   //However not doing this. Assume number is already small enough
   //to fit.
   //So the result needs the low five bits of the exponent, with the high
   //10 bits of the fraction, and the sign.
   result=(fraction>>13) & 0x3FF; //high ten bits
   result+=(exponent>>13) & 0x7C00; //extract exponent five bits
   if (sign)
      bit_set(result,15); //set sign bit
   //So now should have the required 16bits.
   return result;
}


Completely untested, just 'created in my mind', but this should give
the binary16 equivalent of a float32 value.

If you need to go the other way, you'd have to do the opposite, and
rebuilt the float32 value from the parts of the float16.

To do it more correctly, you would probably need to perform some
form of rounding on the conversion. This just clips.
Wi1l



Joined: 03 Oct 2020
Posts: 3

View user's profile Send private message

PostPosted: Sun Oct 04, 2020 10:28 am     Reply with quote

Quote:
What PIC family are you using?. (PIC16/18 or a PIC24/30/33).
Do you need to handle conversions both ways?.


Hi Ttelmah, I am using PIC18 and need to handle conversions both ways. I need to do this because the IEEE data comes from an RTU and I need to do mathematical operations with this data. For this I need to convert the IEEE to float, do the operation and convert it back to IEEE.

Here is my test code. I can't convert the test number (11.75 in IEEE 754 half-precision representation) to float.

Code:
#include <18F4520.h>
#fuses HS,WDT32768,PROTECT,NOLVP,NOBROWNOUT
#use delay(clock=20MHz)
#use rs232(baud=9600, xmit=PIN_C6, rcv=PIN_C7)

#include <ieeefloat.c>
#include <math.h>

int16 dataIEEE;
float resultFloat;
int16 resultIEEE;

void main(){
   dataIEEE = 0x49E0;                        // 11.75 In IEEE 754 half-precision representation
   resultFloat = f_IEEEtoPIC(dataIEEE);

   printf("My Float Numbre = %8.4f\r\n",resultFloat);
   delay_ms(100);
   printf("My IEEE Numbre = %LX\r\n",f_PICtoIEEE(resultFloat));
   delay_ms(100);

while(TRUE);
}
Ttelmah



Joined: 11 Mar 2010
Posts: 19552

View user's profile Send private message

PostPosted: Sun Oct 04, 2020 11:25 am     Reply with quote

No, you won't. You need to write the code yourself.

CCS does not support binary16 format (few people do, the accuracy
is so low, only 3.5digits). The ieeefloat library is to convert numbers
to and from the MicroChip float format to/from the IEEE 32bit format.

Somebody may well have posted code for this. A search should find it.

I doubt if I've handled the exponent correctly. I think you have to
add 128 and tnen perform the rotations, and then subtract 16.
Ttelmah



Joined: 11 Mar 2010
Posts: 19552

View user's profile Send private message

PostPosted: Mon Oct 05, 2020 3:55 am     Reply with quote

OK. I've sat down, and written the code to do this. It uses the ieefloat
driver to do the final conversion for the PIC18/16. I've tested on a PIC24
and on this is works. Haven't tested on the PIC16/18.
Code:

//Routines to convert a float32 to a float16, and vice versa.

//First define the types needed
#include "stdint.h"
typedef uint16_t float16_t; //Use an int16 to hold the new float
union combiner {
   float32 fp;
   uint8_t bytes[4];
   uint16_t words[2];
   uint32_t longword;
}; //To allow the parts to be accessed

//Now what to be able to define constants as 'U32' on any processor
#if defined(__PCD__)
   #define U32(x) x##ul
#else
   #define U32(x) x##ull
#endif

//Now the limit values for the new type

// Smallest positive short float
#define SFLT_MIN 5.96046448e-08
// Smallest positive normalized short float
#define SFLT_NRM_MIN 6.10351562e-05
// Largest positive short float
#define SFLT_MAX 65504.0
// Smallest positive e
// for which (1.0 + e) != (1.0)
#define SFLT_EPSILON 0.00097656
// Number of digits in mantissa
// (significand + hidden leading 1) so actually 10 stored
#define SFLT_MANT_DIG 11
//Number of actual stored bits
#define SFLT_SIGBITS 10
// Number of base 10 digits that
// can be represented without change
#define SFLT_DIG 2
// Base of the exponent
#define SFLT_RADIX 2
// Minimum negative integer such that
// HALF_RADIX raised to the power of
// one less than that integer is a
// normalized short float
#define SFLT_MIN_EXP -13
// Maximum positive integer such that
// HALF_RADIX raised to the power of
// one less than that integer is a
// normalized short float
#define SFLT_MAX_EXP 16
// Minimum positive integer such
// that 10 raised to that power is
// a normalized short float
#define SFLT_MIN_10_EXP -4
// Maximum positive integer such
// that 10 raised to that power is
// a normalized short float
#define SFLT_MAX_10_EXP 4
//Now the value for infinity in this format
#define SFLT_INF U32(0x7C00)

//Function to convert a float32 to float16.
float16_t float32to16(float v)
{
   uint32_t sign;
   uint32_t mantissa, half_mantissa;
   uint32_t exponent;
   uint32_t round_bit;
   signed int32 unbiased;
   uint32_t temp32;
   union combiner val;
#if defined(__PCD__)
   val.fp=v
#else
   //Here code for PCB/PCM/PCH ompiler
   //Need to convert to IEEE format
   val.longword=f_PICtoIEEE(v);
   //Here we are on a PIC 16/18, so data is in different positions
   //Originally extracting directly, but now using the IEEE conversion
   //routines. Found otherwise there is an issue with the exponent
   //not being normalised on the MicroChip format... :(
#endif   

   //Now need to extract the parts from the value.
   exponent = val.longword & U32(0x7F800000); //8bits
   sign=val.longword & U32(0x80000000); //top bit is sign

   mantissa=val.longword & U32(0x7FFFFF);
   //Now test for infinity
   if (exponent==U32(0x7F800000))
   {
      //Maximum exponent in source.
      //two possibilities. If mantissa is zero, then still zero, otherwise INF
      if (mantissa==0)
         temp32=0;
      else
         temp32=U32(0x200);
      //Now have to build the 16bit value with sign, matissa & exponent
      return ((sign>>16) | SFLT_INF | temp32 | (mantissa>>13));
   }
   //So now have to build the half precision value
   sign>>=16; //Move the sign down to bit 15
   //Now need to convert exponent
   unbiased = (((signed int32)(exponent>>23))-127);
   //Now add 15 to generate the exponent for the float16
   unbiased +=15;
   //at this point the unbiased ewponent supports 0 to 0x1F
   
   if (unbiased>=0x1F)
   {
      //here exponent is too large, so return +/- infinity
      return (sign | SFLT_INF);
   }
   
   if (unbiased <=0)
   {
      //Now check for underflow
      if ((14-unbiased)>24)
      {
         //full underflow
         return (sign); //gives a 'signed zero'
      }
      //Now need to add in the missing mantissa bit
      mantissa |= U32(0x800000);
      half_mantissa=mantissa>>(14-unbiased);
      //Now test for rounding
      round_bit=U32(1)<<(13-unbiased);
      if ((mantissa & round_bit) != 0 && (mantissa & (3 * round_bit - 1)) != 0)
         half_mantissa++;
      //No exponent for this   
      return (sign | half_mantissa);
   }
   //Now move the exponent to final location - need this to be done as unsigned
   unbiased = (unsigned int32)(unbiased)<<10;
   half_mantissa=mantissa>>13;
   //Now test for rounding
   round_bit=U32(0x1000);
   if ((mantissa & round_bit) != 0 && (mantissa & (3 * round_bit - 1)) != 0)
   {
      // Round it
      return ((sign | unbiased | half_mantissa) + 1);
   }
   else
   {
      return (sign | unbiased | half_mantissa);
   }   
}

//Now the reverse of the above. Handed a float16, generates a float32
float32 float16to32(float16_t val)
{
   //This is actually quite a bit simpler, since there is no rounding or limits
   //anything that can be held in a float16, can be represented by a float32.
   //First test if we have been given a zero.
   if ((val & 0x7FFF)==0)
   {
      if (bit_test(val,15)) //return 0.0 with the same sign
         return (-0.0);
      else
         return (0.0);
   }
   union combiner result;
   uint32_t half_sign;
   signed int32 half_exp;
   signed int32 exponent;
   signed int32 leading;
   uint8_t digit;
   uint32_t half_mantissa,mantissa;
   half_sign=val & 0x8000u;
   half_exp=val & 0x7C00u;
   half_mantissa=val & 0x3FF;
   //Now test if we have an infinity
   if (half_exp==SFLT_INF)
   {
      if (half_mantissa==0)
      {
         //put the sign bit in
         result.longword=((half_sign<<16) | U32(0x7F800000));
         return result.fp;     
      }
      //If there is a mantissa return this as well, but with MSb set
      result.longword=((half_sign<<16) | U32(0x7FC00000) | (half_mantissa<<13));
      return result.fp;       
   }
   //Now rebuild float32 components
   half_sign<<=16;
   exponent=(half_exp>>10)-15; //can have -ve values here
   if (half_exp==0)
   {
      //here potentially need to adjust mantissa and exponent
      //Depends on how many leading zeros the mantissa has...
      //Need to count down from bit 10
      digit=10;
      leading=0;
      while (bit_test(half_mantissa,digit)==0)
      {
         leading++;
         if (digit==0)
            break; //abort if finished
         digit--;
      }
      //arrive here with number of leading zeros in the mantissa
      //First the exponent - because I have saved this as signed, can handle -ve results
      //However should be well impossible given the number of digits supported....
      //Howeever need int32, to allow room for the rotation.
      exponent=(112-leading)<<23; //127 (float32) - 15 (floast16) for exponent = 112
      mantissa=((half_mantissa & U32(0x3FF))<<13);
      result.longword=(half_sign | exponent | mantissa); 
      return result.fp;
   }
   //Now the final part a value that doesn't require adjustent.
   exponent=(exponent + 127)<<23;
   mantissa=half_mantissa<<13; 
   result.longword=(half_sign | exponent | mantissa); 
#if defined(__PCD__)
   //Here code for PCD compiler
   return result.fp; //IEEE format already
#else
   //Here we are on a PIC 16/18, so data is in different positions
   //need to reformat
   return f_IEEEtoPIC(result.longword);
#endif   
}



//Now basic test code.

#include <24FJ128GA006.h>
#device ICSP=1
#use delay(crystal=20000000)

#FUSES NOWDT                    //No Watch Dog Timer
#FUSES CKSFSM                   //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)

#include <ieeefloat.c>
#include "binary16.h"

void main()
{
   //Now a couple of tests
   float16_t test;
   float32 fpval=100.0;
   test=float32to16(fpval);
   
   //Now does this look right?.
   printf("%04x test\r",test);
   
   //now convert back
   fpval=float16to32(test);
   
   printf("%4.1f fp\r", fpval);

   while(TRUE)
   {
   }
}


Examples:
Code:

//PIC24
#include <24FJ128GA006.h>
#device ICSP=1
#use delay(crystal=20000000)

#FUSES NOWDT                    //No Watch Dog Timer
#FUSES CKSFSM                   //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)

#include <ieeefloat.c>
#include "binary16.h"

void main()
{
   //Now a couple of tests
   float16_t test;
   float32 fpval;
   fpval=100.0;
   test=float32to16(fpval);
   
   //Now does this look right?.
   printf("%04x test\r",test);
   
   //now convert back
   fpval=float16to32(test);
   
   printf("%4.1f fp\r", fpval);
   
   fpval=float16to32(0x49E0);
   
   printf("%4.1f fp\r", fpval);   

   while(TRUE)
   {
   }
}


Code:

//PIC8
#include <18F4520.h>
#device ICSP=1
#use delay(crystal=20000000)

#FUSES NOWDT                    //No Watch Dog Timer
//#FUSES CKSFSM                   //Clock Switching is enabled, fail Safe clock monitor is enabled
#use rs232(UART1, ERRORS)

#include <ieeefloat.c>
#include "binary16.h"

void main()
{
   //Now a couple of tests
   float16_t test;
   float32 fpval;
   fpval=100.0;
   test=float32to16(fpval);
   
   //Now does this look right?.
   printf("%04x test\r",test);
   
   //now convert back
   fpval=float16to32(test);
   
   printf("%4.1f fp\r", fpval);
   
   fpval=float16to32(0x49E0);
   
   printf("%4.1f fp\r", fpval);   

   while(TRUE)
   {
   }
}

Have modified this. Found there is a problem as originally posted on the
PIC16/18. On this format, the compiler uses a exponent of zero for
certain values. This causes issues. So have rewritten, and added a PIC18
example.


Last edited by Ttelmah on Tue Oct 06, 2020 2:07 am; edited 2 times in total
Wi1l



Joined: 03 Oct 2020
Posts: 3

View user's profile Send private message

PostPosted: Mon Oct 05, 2020 12:42 pm     Reply with quote

Hi Ttelmah, when I try to compile it for Pic18 using ccs V4.074, I get "Unknown type" error in this part of code:

Code:
typedef uint16_t float16_t; //Use an int16 to hold the new float
union combiner {
   float32 fp;
   uint8_t bytes[4];
   uint16_t words[2];
   uint32_t longword;
}; //To allow the parts to be accessed


Which could be the cause?
Ttelmah



Joined: 11 Mar 2010
Posts: 19552

View user's profile Send private message

PostPosted: Tue Oct 06, 2020 12:49 am     Reply with quote

You shouldn't, provided you have this line first:

#include "stdint.h"

This gives the definitions for uint8_t etc..

I just changed the processor include to 18F4520.h, and removed the
#fuses CKFSM, and the code compiled as posted.

I've edited the original file, with another change I found necessary for the
PIC18, and have posted a PIC18 example as well. Both work as posted.
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group