Half-Precision Floating Point

6.12 Half-Precision Floating Point

On ARM targets, GCC supports half-precision (16-bit) floating point via the __fp16 type. You must enable this type explicitly with the -mfp16-format command-line option in order to use it.

ARM supports two incompatible representations for half-precision floating-point values. You must choose one of the representations and use it consistently in your program.

Specifying -mfp16-format=ieee selects the IEEE 754-2008 format. This format can represent normalized values in the range of 2^-14 to 65504. There are 11 bits of significand precision, approximately 3 decimal digits.

Specifying -mfp16-format=alternative selects the ARM alternative format. This representation is similar to the IEEE format, but does not support infinities or NaNs. Instead, the range of exponents is extended, so that this format can represent normalized valu

登录查看完整内容