Request For Comments: DRAFT Vadim Antonov Category: Informational Pluris, Inc. 10 February 1997 Rosetta Language Specification, Symbols Status Of This Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. 1. Language Name Symbols Character Set. 2. Language Usage Data The symbols included are commonly used in scientific and business texts. 3. Assigned Language Number +-------------------------+---------+ | Language Name | Symbols | +-------------------------+---------+ |Assigned Language Number | 15 | +-------------------------+---------+ | Selector Sequence (oct) | 217 | +-------------------------+---------+ 4. Direction Of Writing The primary direction of writing is horizontally, from left to right. When end of line is reached, a new line is started underneath the previous line, and writing is resumed from the left side of the new line. In a multi-column text, the first column is on the left side of a page. 5. Rendering Of Numerals The numerals are rendered in a decimal positional system, using ASCII (language number 0) characters from DIGIT ZERO to DIGIT NINE with the most significant position on the left. Antonov [Page 1] RFC DRAFT Rosetta Language Specification, Symbols February 1997 A common sign for separation of integer and fractal parts is PERIOD, sometimes thousands (groups of 3 digits) are separated with COMMA. Vulgar fractions with bases 2, 3, 4, 5, 6, 8, 10 and 12 can also be represented using FRACTION symbos (see code page 100U). 6. Charachter Set 6.1. Symbols Symbols do no have upper and lower case representations. The encoding of the symbols is dual-octet; i.e. the first octet selects code page, the second octet selects symbol within the code page. For convinience, the code pages are split in pairs, corresponding lower case and upper case halves of alphabet. Within pairs, those code pages are marked with letter L (corresponding second octet's values in range 100-176 octal), and U (300-376 octal). 6.1.1. Code Page 100L - Punctuation Signs This code page contains non-ASCII punctuation signs used in texts written in natural languages. +-----------+-----------------------------+ |Octal Code | Character Name | +-----------+-----------------------------+ | 100 100 | DASH | | 100 101 | HAIR SPACE | | 100 102 | BLANK SIGN | +-----------+-----------------------------+ | 100 104 | COMBINING UNDERSCORE | | 100 105 | COMBINING OVERSTRIKE | | 100 106 | COMBINING OVERSCORE | +-----------+-----------------------------+ | 100 110 | CENTER DOT | | 100 111 | CENTER DOUBLE DOT | +-----------+-----------------------------+ | 100 112 | INVERTED EXCLAMATION MARK | | 100 113 | INVERTED QUESTION MARK | +-----------+-----------------------------+ | 100 114 | LEFT SINGLE QUOTATION MARK | | 100 115 | RIGHT SINGLE QUOTATION MARK | +-----------+-----------------------------+ | 100 116 | LEFT DOUBLE QUOTATION MARK | | 100 117 | RIGHT DOUBLE QUOTATION MARK | +-----------+-----------------------------+ | 100 120 | LEFT GUILLEMET | +-----------+-----------------------------+ Antonov [Page 2] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+-----------------------------+ |Octal Code | Character Name | +-----------+-----------------------------+ | 100 121 | RIGHT GUILLEMET | +-----------+-----------------------------+ | 100 122 | LEFT ANGLE BRACKET | | 100 123 | RIGHT ANGLE BRACKET | +-----------+-----------------------------+ | 100 124 | LEFT DOUBLE ANGLE BRACKET | | 100 125 | RIGHT DOUBLE ANGLE BRACKET | +-----------+-----------------------------+ | 100 130 | FOOTNOTE ASTERISK | | 100 131 | DAGGER | | 100 132 | DOUBLE DAGGER | +-----------+-----------------------------+ | 100 140 | RING | | 100 141 | BULLET | | 100 142 | SMALL SQUARE | | 100 143 | SMALL BLACK SQUARE | | 100 144 | SMALL DIAMOND | | 100 145 | SMALL BLACK DIAMOND | +-----------+-----------------------------+ | 100 150 | CIRCLE | | 100 151 | BLACK CIRCLE | | 100 152 | SQUARE | | 100 153 | BLACK SQUARE | | 100 154 | DIAMOND | | 100 155 | BLACK DIAMOND | +-----------+-----------------------------+ The DASH symbol is wider than HYPHEN, and is usually as wide as ENGLISH LETTER SMALL M. The HAIR SPACE is 1 point (1/72 inch). A table of shapes of Punctuation Signs can be found at http://www.pluris.com/rosetta/symbols-100L.gif 6.1.2. Code Page 100U - Business Symbols This code page contains symbols commonly used in business and legal communications. +-----------+----------------------------+ |Octal Code | Character Name | +-----------+----------------------------+ | 100 300 | BROKEN BAR | | 100 301 | SECTION SIGN | +-----------+----------------------------+ Antonov [Page 3] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+----------------------------+ |Octal Code | Character Name | +-----------+----------------------------+ | 100 302 | PARAGRAPH SIGN | | 100 303 | COPYRIGHT SIGN | | 100 304 | REGISTERED SIGN | | 100 305 | TRADEMARK SIGN | | 100 306 | TELEPHONE | | 100 307 | BLACK TELEPHONE | +-----------+----------------------------+ | 100 310 | CHECK MARK | | 100 311 | HEAVY CHECK MARK | | 100 312 | SQUARE WITH CHECK MARK | | 100 313 | SQUARE WITH X | | 100 314 | PER MILLE SIGN | +-----------+----------------------------+ | 100 320 | BLACK LEFT-POINTING INDEX | | 100 321 | BLACK RIGHT-POINTING INDEX | | 100 322 | BLACK UP-POINTING INDEX | | 100 323 | BLACK DOWN-POINTING INDEX | +-----------+----------------------------+ | 100 324 | WHITE LEFT-POINTING INDEX | | 100 325 | WHITE RIGHT-POINTING INDEX | | 100 326 | WHITE UP-POINTING INDEX | | 100 327 | WHITE DOWN-POINTING INDEX | +-----------+----------------------------+ | 100 330 | FRACTION ONE HALF | | 100 331 | FRACTION ONE THIRD | | 100 332 | FRACTION TWO THIRDS | | 100 333 | FRACTION ONE QUARTER | | 100 334 | FRACTION THREE QUARTERS | | 100 335 | FRACTION ONE FIFTH | | 100 336 | FRACTION TWO FIFTHS | | 100 337 | FRACTION THREE FIFTHS | | 100 340 | FRACTION FOUR FIFTHS | | 100 341 | FRACTION ONE SIXTH | | 100 342 | FRACTION FIVE SIXTHS | | 100 343 | FRACTION ONE EIGHTH | | 100 344 | FRACTION THREE EIGHTHS | | 100 345 | FRACTION FIVE EIGHTHS | | 100 346 | FRACTION SEVEN EIGHTHS | | 100 347 | FRACTION ONE TENTH | | 100 350 | FRACTION THREE TENTHS | | 100 351 | FRACTION SEVEN TENTHS | | 100 352 | FRACTION NINE TENTHS | | 100 353 | FRACTION ONE TWELFTH | | 100 354 | FRACTION FIVE TWELFTHS | | 100 355 | FRACTION SEVEN TWELFTHS | | 100 356 | FRACTION ELEVEN TWELFTHS | +-----------+----------------------------+ | 100 360 | CURRENCY SIGN | +-----------+----------------------------+ Antonov [Page 4] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+----------------------------+ |Octal Code | Character Name | +-----------+----------------------------+ | 100 361 | CENT SIGN | | 100 362 | POUND SIGN | | 100 363 | YEN SIGN | +-----------+----------------------------+ A table of shapes of Business Signs can be found at http://www.pluris.com/rosetta/symbols-100U.gif 6.1.3. Code Pages 101L and 101U - Mathematical Symbols These code pages contain symbols used in mathematical formulae. +-----------+----------------------------------------+ |Octal Code | Character Name | +-----------+----------------------------------------+ | 101 100 | MINUS SIGN | | 101 101 | MULTIPLICATION SIGN | | 101 102 | DIVISION SIGN | | 101 103 | PLUS-MINUS SIGN | | 101 104 | ROOT SIGN | | 101 105 | ROOT SIGN CONTINUATION | +-----------+----------------------------------------+ | 101 107 | INFINITY | +-----------+----------------------------------------+ | 101 110 | EQUIVALENT TO | | 101 111 | PROPORTION | | 101 112 | APPROXIMATELY EQUAL TO | | 101 113 | NOT EQUAL TO | | 101 114 | NOT APPROXIMATELY EQUAL TO | | 101 115 | LESS THAN OR EQUAL TO | | 101 116 | GREATER THAN OR EQUAL TO | | 101 117 | LESS THAN OR APPROXIMATELY EQUAL TO | | 101 120 | GREATER THAN OR APPROXIMATELY EQUAL TO | | 101 121 | MUCH LESS THAN | | 101 122 | MUCH GREATER THAN | +-----------+----------------------------------------+ | 101 130 | END OF PROOF | | 101 131 | FOR ALL | | 101 132 | THERE EXISTS | | 101 133 | THERE DOES NOT EXIST | | 101 134 | LOGICAL AND | | 101 135 | LOGICAL OR | | 101 136 | NEGATION | +-----------+----------------------------------------+ Antonov [Page 5] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+----------------------------------------+ |Octal Code | Character Name | +-----------+----------------------------------------+ | 101 140 | INTEGRAL | | 101 141 | DOUBLE INTEGRAL | | 101 142 | TRIPLE INTEGRAL | | 101 143 | CONTOUR INTEGRAL | | 101 144 | SURFACE INTEGRAL | | 101 145 | VOLUME INTEGRAL | | 101 146 | LEFT HALF-CONTOUR INTEGRAL | | 101 147 | RIGHT HALF-CONTOUR INTEGRAL | | 101 150 | CLOCKWISE CONTOUR INTEGRAL | | 101 151 | COUNTER-CLOCKWISE CONTOUR INTEGRAL | +-----------+----------------------------------------+ | 101 152 | PARTIAL DIFFERENTIAL | | 101 153 | NABLA | +-----------+----------------------------------------+ | 101 154 | GRADUS SIGN | | 101 155 | MINUTE SIGN | | 101 156 | SECOND SIGN | +-----------+----------------------------------------+ | 101 160 | ELEMENT OF | | 101 161 | NOT AN ELEMENT OF | | 101 162 | CONTAINS AS A MEMBER | | 101 163 | DOES NOT CONTAIN AS A MEMBER | | 101 164 | SUBSET OF | | 101 165 | NOT A SUBSET OF | | 101 166 | SUPERSET OF | | 101 167 | NOT A SUPERSET OF | | 101 170 | SUBSET OF OR EQUAL TO | | 101 171 | NEITHER A SUBSET OF NOR EQUAL TO | | 101 172 | SUPERSET OF OR EQUAL TO | | 101 173 | NEITHER A SUPERSET OF NOR EQUAL TO | | 101 174 | UNION | | 101 175 | INTERSECTION | | 101 176 | EMPTY SET | +-----------+----------------------------------------+ | 101 300 | CIRCLED PLUS | | 101 301 | CIRCLED MINUS | | 101 302 | CIRCLED TIMES | | 101 303 | CIRCLED DIVISION | | 101 304 | CIRCLED RING | | 101 305 | CIRCLED DOT | +-----------+----------------------------------------+ | 101 310 | SQUARED PLUS | | 101 311 | SQUARED MINUS | | 101 312 | SQUARED TIMES | | 101 313 | SQUARED DIVISION | | 101 314 | SQUARED RING | | 101 315 | SQUARED DOT | +-----------+----------------------------------------+ Antonov [Page 6] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+----------------------------------------+ |Octal Code | Character Name | +-----------+----------------------------------------+ | 101 320 | ANGLE | | 101 321 | SECTOR | | 101 322 | RIGHT ANGLE | | 101 323 | EQUAL AND PARALLEL TO | | 101 324 | ARC ABOVE | | 101 325 | ANGLE ABOVE | +-----------+----------------------------------------+ | 101 330 | LEFT CEILING | | 101 331 | RIGHT CEILING | | 101 332 | LEFT FLOOR | | 101 333 | RIGHT FLOOR | +-----------+----------------------------------------+ | 101 340 | COMBINING DOT DERIVATIVE | | 101 341 | COMBINING SECOND DOT DERIVATIVE | | 101 342 | COMBINING TILDE | | 101 343 | COMBINING LEFTWARDS ARROW | | 101 344 | COMBINING RIGHTWARDS ARROW | +-----------+----------------------------------------+ Tables of shapes of Mathematical Symbols can be found at http://www.pluris.com/rosetta/symbols-101L.gif http://www.pluris.com/rosetta/symbols-101U.gif MINUS SIGN should be of the same witdth as PLUS SIGN from ASCII/English character set. COMBINING ARROW symbols should be compatible with COMBINING OVERSCORE from the punctuation code page. ROOT SIGN CONTINUATION is very much like COMBINING OVERSCORE, but does not combine (i.e. it always advances current position). It must be compatible with the ROOT SIGN. 6.1.4. Code Page 102U - Forms Symbols This code page contains symbols used to compose forms and tables, primarily in conjunction with fixed-width fonts. +-----------+-------------------------------------------+ |Octal Code | Character Name | +-----------+-------------------------------------------+ | 102 300 | FORMS LIGHT HORIZONTAL BAR | | 102 301 | FORMS LIGHT VERTICAL BAR | | 102 302 | FORMS LIGHT HORIZONTAL TRIPLE DASH | +-----------+-------------------------------------------+ Antonov [Page 7] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+-------------------------------------------+ |Octal Code | Character Name | +-----------+-------------------------------------------+ | 102 303 | FORMS LIGHT VERTICAL TRIPLE DASH | +-----------+-------------------------------------------+ | 102 304 | FORMS HEAVY HORIZONTAL BAR | | 102 305 | FORMS HEAVY VERTICAL BAR | | 102 306 | FORMS HEAVY HORIZONTAL TRIPLE DASH | | 102 307 | FORMS HEAVY VERTICAL TRIPLE DASH | +-----------+-------------------------------------------+ | 102 310 | FORMS LIGHT RIGHT AND LIGHT DOWN | | 102 311 | FORMS LIGHT HORIZONTAL AND LIGHT DOWN | | 102 312 | FORMS LIGHT LEFT AND LIGHT DOWN | | 102 313 | FORMS LIGHT RIGHT AND LIGHT VERTICAL | | 102 314 | FORMS LIGHT HORIZONTAL AND LIGHT VERTICAL | | 102 315 | FORMS LIGHT LEFT AND LIGHT VERTICAL | | 102 316 | FORMS LIGHT RIGHT AND LIGHT UP | | 102 317 | FORMS LIGHT HORIZONTAL AND LIGHT UP | | 102 320 | FORMS LIGHT LEFT AND LIGHT UP | +-----------+-------------------------------------------+ | 102 321 | FORMS HEAVY RIGHT AND HEAVY DOWN | | 102 322 | FORMS HEAVY HORIZONTAL AND HEAVY DOWN | | 102 323 | FORMS HEAVY LEFT AND HEAVY DOWN | | 102 324 | FORMS HEAVY RIGHT AND HEAVY VERTICAL | | 102 325 | FORMS HEAVY HORIZONTAL AND HEAVY VERTICAL | | 102 326 | FORMS HEAVY LEFT AND HEAVY VERTICAL | | 102 327 | FORMS HEAVY RIGHT AND HEAVY UP | | 102 330 | FORMS HEAVY HORIZONTAL AND HEAVY UP | | 102 331 | FORMS HEAVY LEFT AND HEAVY UP | +-----------+-------------------------------------------+ | 102 332 | FORMS LIGHT RIGHT AND HEAVY DOWN | | 102 333 | FORMS LIGHT HORIZONTAL AND HEAVY DOWN | | 102 334 | FORMS LIGHT LEFT AND HEAVY DOWN | | 102 335 | FORMS LIGHT RIGHT AND HEAVY VERTICAL | | 102 336 | FORMS LIGHT HORIZONTAL AND HEAVY VERTICAL | | 102 337 | FORMS LIGHT LEFT AND HEAVY VERTICAL | | 102 340 | FORMS LIGHT RIGHT AND HEAVY UP | | 102 341 | FORMS LIGHT HORIZONTAL AND HEAVY UP | | 102 342 | FORMS LIGHT LEFT AND HEAVY UP | +-----------+-------------------------------------------+ | 102 343 | FORMS HEAVY RIGHT AND LIGHT DOWN | | 102 344 | FORMS HEAVY HORIZONTAL AND LIGHT DOWN | | 102 345 | FORMS HEAVY LEFT AND LIGHT DOWN | | 102 346 | FORMS HEAVY RIGHT AND LIGHT VERTICAL | | 102 347 | FORMS HEAVY HORIZONTAL AND LIGHT VERTICAL | | 102 350 | FORMS HEAVY LEFT AND LIGHT VERTICAL | | 102 351 | FORMS HEAVY RIGHT AND LIGHT UP | | 102 352 | FORMS HEAVY HORIZONTAL AND LIGHT UP | | 102 353 | FORMS HEAVY LEFT AND LIGHT UP | +-----------+-------------------------------------------+ | 102 354 | FORMS LIGHT CROSS WITH HEAVY RIGHT | +-----------+-------------------------------------------+ Antonov [Page 8] RFC DRAFT Rosetta Language Specification, Symbols February 1997 +-----------+-------------------------------------------+ |Octal Code | Character Name | +-----------+-------------------------------------------+ | 102 355 | FORMS LIGHT CROSS WITH HEAVY LEFT | | 102 356 | FORMS LIGHT CROSS WITH HEAVY UP | | 102 357 | FORMS LIGHT CROSS WITH HEAVY DOWN | +-----------+-------------------------------------------+ | 102 360 | FORMS HEAVY CROSS WITH LIGHT RIGHT | | 102 361 | FORMS HEAVY CROSS WITH LIGHT LEFT | | 102 362 | FORMS HEAVY CROSS WITH LIGHT UP | | 102 363 | FORMS HEAVY CROSS WITH LIGHT DOWN | +-----------+-------------------------------------------+ | 102 364 | FORMS LIGHT LEFT UP AND HEAVY RIGHT DOWN | | 102 365 | FORMS LIGHT LEFT DOWN AND HEAVY RIGHT UP | | 102 366 | FORMS LIGHT RIGHT UP AND HEAVY LEFT DOWN | | 102 367 | FORMS LIGHT RIGHT DOWN AND HEAVY LEFT UP | +-----------+-------------------------------------------+ | 102 370 | FORMS BLACK BLOCK | | 102 371 | FORMS DARK GREY BLOCK | | 102 372 | FORMS GREY BLOCK | | 102 373 | FORMS LIGHT GREY BLOCK | +-----------+-------------------------------------------+ A table of shapes of Forms Symbols can be found at http://www.pluris.com/rosetta/symbols-102U.gif 6.2. Digits The ASCII (European "Arabic") digits are used. The shapes of digits appearing in words with Symbols language code should be identical to the shapes of corresponding ASCII/English digits. 7. Hints The only defined hint code is 300 octal, HYPHENATION POINT. The HYPNENATION POINT hint is not displayed (although it can be shown during input process or in text editors), and is used as an instruction to rendering engines to hypnehate a sequence of symbols at specified hyphenation points. Sequences of symbols are not hyphenated automatically. When HYPHENATION POINT hint precedes the first character a word and no other hyphenation points are specified, the sequence cannot be hyphenated. Hypenation of a sequence of symbols does not add HYPHEN at the end of the line. Antonov [Page 9] RFC DRAFT Rosetta Language Specification, Symbols February 1997 8. Word Comparison And Case Conversion The algorithms for comparison of Symbols words, and conversion between upper and lower cases are the same as the Rosetta default: lexicographical comparison by octet values, and null case conversion, as described in Rosetta Encoding For Multi-Lingual Texts. 9. Special Rendition Techniques Symbols are always output without any modifications, except for the Forms and combining symbols in conjunction with variable-width fonts, as described below. A combining symbol is printed over the preceding character (on the left). The width of a combining symbol is adjusted to be the same as the width or the overprinted character; this may require modification of its shape. In a sequence of characters overprinted with combining symbols, there should be no intervals between the shapes of the combining symbols. Width adjustment of a combining symbol is performed by replication of leftmost and rightmost columns of pixels (so the resulting shape is centered in regard to the overprinted character). No width adjustment of combining symbols is needed if fixed-width fonts are used. The Forms symbols are mostly intended for use with fixed-width fonts; however in order to reproduce text correctly if variable-width fonts are used, the following procedure is recommended: (a) Adjust the hight of the Forms symbols to reach top of the following line, and bottom of the previous line (if applicable). (b) When the previous line contains no Forms symbols, adjust the width of FORMS HORIZONTAL BAR symbols to the maximal character width of the currently used fonts, and output the line. (c) If the previous line contains Forms symbols with downward pointing lines, perform the matching of position of Forms elements with corresponsing upward pointing lines, by appropriate adjustment of the width of the HORIZONTAL BAR symbols. If matching is not possible, use the maximal width for output of the Forms symbols. An alternative approach to rendering Forms symbols given proportional fonts may assume calculation of positions of the Forms characters in a number of consequtive lines, and then performing multiple-line matching, so the widths of resulting columns are minimized. This approach also allows to replace rendering of Forms characters with drawing lines between the calculated positions. Antonov [Page 10] RFC DRAFT Rosetta Language Specification, Symbols February 1997 Typesetting of mathematical formulae generally requires ability for arbitrary positioning of characters on a medium, and ability to select differerent point sizes. Such rendering is not supported by the purely textual Rosetta encoding, and requires an additional typesetting control language. 10. Input From Keyboard There's no recommended method for input from the keyboard. An alternative method, such as selection from an on-screen menu may be preferrable. 11. Conversion To/From Other Character Sets Conversion tables from other character sets used in national representations can be found in corresponding Language Specification documents. 12. Changes From The Previous Revision This is the initial version of the document. Frequent additions are expected. 13. Security Considerations Not addressed in this document. 14. References [RFC XXX] V. Antonov, Rosetta Encoding For Multi-Lingual Texts, 1/97. [RFC XXX] V. Antonov, Rosetta Language Specification, ASCII/English, 1/97. 15. Author's Address Vadim Antonov Pluris, Inc. 2307 Coronet Blvd. Belmont, CA 94002 e-mail: avg@pluris.com fax: +1 (415) 654-9222 Antonov [Page 11]