The Open Group Base Specifications Issue 7
IEEE Std 1003.1-2008, 2016 Edition
Copyright © 2001-2016 The IEEE and The Open Group

NAME

float.h - floating types

SYNOPSIS

#include <float.h>

DESCRIPTION

[CX] [Option Start] The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard. [Option End]

The characteristics of floating types are defined in terms of a model that describes a representation of floating-point numbers and values that provide information about an implementation's floating-point arithmetic.

The following parameters are used to define the model for each floating-point type:

s
Sign (±1).
b
Base or radix of exponent representation (an integer >1).
e
Exponent (an integer between a minimum emin and a maximum emax).
p
Precision (the number of base-b digits in the significand).
fk
Non-negative integers less than b (the significand digits).

A floating-point number x is defined by the following model:

In addition to normalized floating-point numbers (f1>0 if x!=0), floating types may be able to contain other kinds of floating-point numbers, such as subnormal floating-point numbers (x!=0, e= emin, f1=0) and unnormalized floating-point numbers (x!=0, e> emin, f1=0), and values that are not floating-point numbers, such as infinities and NaNs. A NaN is an encoding signifying Not-a-Number. A quiet NaN propagates through almost every arithmetic operation without raising a floating-point exception; a signaling NaN generally raises a floating-point exception when occurring as an arithmetic operand.

An implementation may give zero and non-numeric values, such as infinities and NaNs, a sign, or may leave them unsigned. Wherever such values are unsigned, any requirement in POSIX.1-2008 to retrieve the sign shall produce an unspecified sign and any requirement to set the sign shall be ignored.

The accuracy of the floating-point operations ( '+', '-', '*', '/' ) and of the functions in <math.h> and <complex.h> that return floating-point results is implementation-defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the accuracy is unknown.

All integer values in the <float.h> header, except FLT_ROUNDS, shall be constant expressions suitable for use in #if preprocessing directives; all floating values shall be constant expressions. All except DECIMAL_DIG, FLT_EVAL_METHOD, FLT_RADIX, and FLT_ROUNDS have separate names for all three floating-point types. The floating-point model representation is provided for all values except FLT_EVAL_METHOD and FLT_ROUNDS.

The rounding mode for floating-point addition is characterized by the implementation-defined value of FLT_ROUNDS:

-1
Indeterminable.
 0
Toward zero.
 1
To nearest.
 2
Toward positive infinity.
 3
Toward negative infinity.

All other values for FLT_ROUNDS characterize implementation-defined rounding behavior.

The values of operations with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of FLT_EVAL_METHOD:

-1
Indeterminable.
 0
Evaluate all operations and constants just to the range and precision of the type.
 1
Evaluate operations and constants of type float and double to the range and precision of the double type; evaluate long double operations and constants to the range and precision of the long double type.
 2
Evaluate all operations and constants to the range and precision of the long double type.

All other negative values for FLT_EVAL_METHOD characterize implementation-defined behavior.

The <float.h> header shall define the following values as constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign.

The <float.h> header shall define the following values as constant expressions with implementation-defined values that are greater than or equal to those shown:

The <float.h> header shall define the following values as constant expressions with implementation-defined (positive) values that are less than or equal to those shown:


The following sections are informative.

APPLICATION USAGE

None.

RATIONALE

All known hardware floating-point formats satisfy the property that the exponent range is larger than the number of mantissa digits. The ISO C standard permits a floating-point format where this property is not true, such that the largest finite value would not be integral; however, it is unlikely that there will ever be hardware support for such a floating-point format, and it introduces boundary cases that portable programs should not have to be concerned with (for example, a non-integral DBL_MAX means that ceil() would have to worry about overflow). Therefore, this standard imposes an additional requirement that the largest representable finite value is integral.

FUTURE DIRECTIONS

None.

SEE ALSO

<complex.h>, <math.h> , <stdio.h>, <stdlib.h>, <wchar.h>

CHANGE HISTORY

First released in Issue 4. Derived from the ISO C standard.

Issue 6

The description of the operations with floating-point values is updated for alignment with the ISO/IEC 9899:1999 standard.

Issue 7

ISO/IEC 9899:1999 standard, Technical Corrigendum 2 #4 (SD5-XBD-ERN-50) and #5 (SD5-XBD-ERN-51) are applied.

POSIX.1-2008, Technical Corrigendum 1, XBD/TC1-2008/0046 [346] and XBD/TC1-2008/0047 [346] are applied.

End of informative text.

 

return to top of page

UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
Copyright © 2001-2016 The IEEE and The Open Group, All Rights Reserved
[ Main Index | XBD | XSH | XCU | XRAT ]