Index

Grâl Inter-Platform Message Format

Abstract

This document describes the binary encoding of messages allowing communication between different hardware platforms.

Overview

The uniform inter-platform message (IPM) format is used for communication between processes on systems with different representation of hardware data types.

There are two approaches to platform-independent communications: the simpliest (and most widely used) approach is to choose some "standard" way of encoding elementary data types (usually referred to as network format, encode messages into that format before transmission and decode them back into host format on receiption.

However, this approach is inefficient when host data format is not the same as network data format. In this case exchange of messages within the host (or between nodes of a MIMD machine) causes costly and completely unnecessary conversions to network format and back. As a result this approach cannot be used to provide a uniform way for communication with both local and remote processes.

The second approach involves tagging messages with some description of how data is represented within those messages, so sender may transmit messages in its local host format, without any encoding. If sender's data representation corresponds to the recipient's internal representation no conversion on receipt is necessary. If representations are different, the recipient must use sender's data format description to convert message to the local format.

This approach requires all hosts to be able to convert data from all other hosts' representations. Although it may seem very difficult the reality is that practically all computers have similar data formats, differing mostly in such details as byte order, sizes of data types, etc. The Grâl IPM data format description covers practically all modern machines. In any case, a program running on a really non-trivial machine may revert to pre-encoding messages to make them parseable by Grâl message recipients.

The Grâl message format is designed to simplify conversions when communicating hosts belong to the dominant species of binary computers with integer type sizes being powers of 2, twos-complement representation for negative integers and IEEE Std 754-like floating point numbers. To make conversions more efficient some restrictions are placed on ranges of values and alignment of elements within messages.

Elementary Data Types

Grâl inter-platform messages are composed from the following elementary data types (names of types below are symbolic and may vary in different languages):

Type Name Range Of Values Precision
char -(27-1) 27-1
uchar 0 28-1
short -(215-1) 215-1
ushort 0 216-1
long -(231-1) 231-1
ulong 0 232-1
xlong -(263-1) 263-1
uxlong 0 264-1
float -3.4x1038 3.4x1038 24 bits
double -1.79x10308 1.79x10308 53 bits

The corresponding host data types should be able to accomodate all values within specified ranges; transmission of values outside of those ranges is not guaranteed. Note that signed integer values do not include -2n-1 (where n is the number of bits) as it cannot be represented on machines with ones-complement negative numbers. Conversely, -0 is converted to +0 if the destination host uses twos-complement representation. To avoid conversion problems bit masks must always be represented as unsigned values.

Floating point numbers may be reduced in precision or reduced to infinities or zeroes in the process of conversion; however it is safe to assume that there won't be any unnecessary reduction of precision or limitation of range. Preservation of NaNs, infinite values and unnormalized numbers is not guaranteed.

Host representations of longer data types must be at least as large as of correponding shorter types. I.e. number of bits in host representation of long cannot be less than in representation of short, but can be less than in representation of ushort.

Data format conversion does not include transliteration or any other conversion of character strings. If any such conversion is necessary, it is a responsibility of the application.

Special care must be taken to ensure that elements within the messages are always aligned on boundary corresponding their size, relative to the beginning of the message, even if local host hardware does not require such alignment. Composite elements (arrays, structures, unions) must always be aligned accordingly with sizes of their largest elementary members.

Message Format

An application programmer must reserve special header field in the beginning of message structure for description of local binary representation and initialize that field with the local host's constant before sending the message. That field takes 2 octets (groups of 8 bits) if source host has 8/16/32/64 bit integer numbers and IEEE Std 754-compliant floating point numbers. Otherwise, long (16-octet) field should be used.

Message conversion must not cause more than two-fold increase in its size (plus 14 octets if longer version of header is required); this limits the choice of hardware data types used to represent elementary data types. This limitation allows to pre-allocate sufficient memory for decoding of incoming messages (if messages are variable-size) or for receiption of messages (if messages are fixed-size). Obviously messages originated from machines with standard integer sizes and IEEE 754 floating point formats will have minimal possible size.

Format of the full 16-octet header is:

Octet # Type Description
0bitmaskInteger format flags
1bitmaskFloating point format flags
2unsignedTotal size of float, bits
3unsigned float's significand size, bits
(excluding integer bit)
42s-complement
signed
Exponent bias value defect for float
5unsignedTotal size of double, bits
6unsigned double's significand size, bits
(excluding integer bit)
72s-complement
signed
Exponent bias value defect for double
8unsignedSize of char, bits
9unsignedSize of uchar, bits
10unsignedSize of short, bits
11unsignedSize of ushort, bits
12unsignedSize of long, bits
13unsignedSize of ulong, bits
14unsignedSize of xlong, bits
15unsignedSize of uxlong, bits

Short (2-octet) message header is used when most-significant bits (0200) of both integer and floating point format octets are zero (see definitions below). Format of short message header is:

Octet # Type Description
0bitmaskInteger format flags
1bitmaskFloating point format flags

Integer format flags (octet 0) are:

Bit
octal
Meaning If Clear Meaning If Set
1 Integers are big-endian (most
significant byte is first)
Integers are little-endian (most
significant byte is last)
2 Negative numbers are
twos-complement
Negative numbers are
ones-complement
4 Natural order of bytes short-sized pairs are swapped
in long integers
200 Integer sizes are
8/16/32/64 bits
Integer sizes are in
header octets 8-15

The implicit assumptions about integer data representations are:

If those conditions are not met the native host data format is not acceptable, and messages sourced by such host must be pre-encoded.

There are no restrictions on relative bit sizes of integer data types, except for the requirement that longer types should not have shorter representations. It means that sizes of long types do not have to be divisible by sizes of shorter types, i.e. combinbation of 12-bit char, 18-bit short and 36-bit long is acceptable. Field alignment is always counted in bits, not bytes.

Floating-point format flags (octet 1) are:

Bit(s)
octal
Meaning If Clear Meaning If Set
1 FP numbers are big-endian
(most significant byte is first)
FP numbers are little-endian
(most significant byte is last)
6 Binary exponent base as follows:

BitsExponent Base
02
24
48
616

10 Exponent base is 2n Exponent base is 10, binary
exponent base bits must be 0
20 Exponent is unsigned and
biased
Exponent is ones-compement
signed
40 Fields of FP are laid out as:
Fields of FP are laid out as:
100 Integer bit of significand
is hidden
Integer bit of significand
is explicit
200 FP data types are
IEEE 754-compliant, bits
0176 must be clear
FP data types are not
IEEE 754-compliant (values of
header octets 2-7 are used)

The floating point numbers are assumed to be represented as

sign * exponent_baseexponent * significand
where sign can be 1 or -1, and significand is a fixed-point binary (or binary-coded decimal, if exponent base is 10) number which is either 0 or 1+x where x is non-negative and less than 1. (In other words, floating point numbers are assumed to be normalized). If the integer part of significand is hidden, zero is represented with minimal value of exponent and zero fractional part of significand.

Exponent base may be 2, 4, 8, 16 or 10; exponent may be represented as ones-complement with the most significant bit of exponent used as its sign, binary representation is

E = (sign exponent * abs exponent) - bias_defect;
or as biased unsigned representation
E = exponent + 2n-1 - 1 - bias_defect,
where n is the number of bits in the exponent field.

Zero sign bit is assumed to mean positive values; same applies to exponent sign bit if exponent field is ones-complement. If the host's native floating point data format does not conform to those assumption, an additional pre-encoding must be used.

Sending And Receiving Messages In C

The Grâl IPM format is defined in #include-file <message.h>. Message structures should be declared as

struct ... { FORMAT_HEADER; .... } where FORMAT_HEADER takes 2 or 16 octets depending on type of the local host.

Usually C types correspond to Grâl IPM types as following:

IPM Type C Type
charsigned char
ucharunsigned char
shortsigned short
ushortunsigned short
longsigned long
ulongunsigned long
xlongsigned long long
uxlongunsigned long long
floatfloat
doubledouble

Note that C type int is not used; the reason is that its length is usually implementation-dependent even on "standard" machines. Most C compilers align fields accordingly to the Grâl IMP requirements, but do not take that for granted.

Before sending message out its format header must be initialized with the description of local machine data types; this is done as

SET_FORMAT(message);

On receiption, message must be explicitly decoded:

DECODE(message, message_size, { conversion code ... }); where message is pointer to the beginning of the incoming message and message_size is the size of the incoming message in bytes.

Conversion code is a segment of C code using the following macros:

Macro Definition
CVT_CHARConvert one char
CVT_CHARS(n)Convert n chars
CVT_UCHARConvert one uchar
CVT_UCHARS(n)Convert n uchars
CVT_SHORTConvert one short
CVT_SHORTS(n)Convert n shorts
CVT_USHORTConvert one ushort
CVT_USHORTS(n)Convert n ushorts
CVT_LONGConvert one long
CVT_LONGS(n)Convert n longs
CVT_ULONGConvert one ulong
CVT_ULONGS(n)Convert n ulongs
CVT_XLONGConvert one xlong
CVT_XLONGS(n)Convert n xlongs
CVT_UXLONGConvert one uxlong
CVT_UXLONGS(n)Convert n uxlongs
CVT_FLOATConvert one float
CVT_FLOATSS(n)Convert n floats
CVT_DOUBLEConvert one double
CVT_DOUBLES(n)Convert n doubles
CVT_EOMTrue if end of message is reached

Every call to a conversion macro converts one or more data elements of a corresponding type, starting from the beginning of the message. Since previous fields are already converted into local host's representation, the conversion code may choose different course of action. This can be used to decode messages with unions or variable-size elements within.

If message contains only integer data elements use IDECODE instead of DECODE as it may produce significantly faster conversion code.


Index