Category: Update

Plan for Unicode support

22 September, 2012 | By Nenad Rakocevic

Red is growing up fast, even if just born two weeks ago! It is time we implement basic string support so we can do our first, real, hello-word. ;-)

Red strings will natively support Unicode. In order to achieve that in an efficient and cross-platform way, we need a good plan. Here is the list of Unicode native formats used by our main target platforms API:

    Windows       : UTF-16
    Linux         : UTF-8
    MacOSX/Cocoa  : UTF-16
    MacOSX/Darwin : UTF-8
    Java          : UTF-16
    .Net          : UTF-16
    Javascript    : UTF-8
    Syllable      : UTF-8

All these formats are variable-width encodings, requiring any indexed access to pay the cost of walking through the string.

Fortunately, there are also fixed-width Unicode encodings that can be used to give us back constant time for indexed accesses. So, in order to make it the most space-efficient, Red strings will internally support only these encoding formats:

    Latin-1 (1 byte/codepoint)
    UCS-2   (2 bytes/codepoint)
    UCS-4   (4 bytes/codepoint)

This is not something new, at least Python 3.3 does it in the same way.

Additionally, UTF-8 and UTF-16 codecs will be supported, in order to deal with I/O accesses on host platforms.

Red will use UTF-8 for exchanging strings with outer world by default, except when accessing a UTF-16 API is necessary. Conversion for input and output strings will be done on-the-fly between one of the internal representation and UTF-8/UTF-16. When reading an input string, Red will select the most space-efficient internal format depending on highest codepoint in the input string. Also users should be able to force the encoding of a string to a given internal format, when possible.

So far, this is the plan for additing Unicode to Red, a prototype implementation will be done quickly, so we can fine-tune it if required.

Comments and suggestions are welcome.

Don’t just wait for new blog entries; join us on Gitter

Chat with the team, keep up with the latest design issues, and be part of a great community
Join Us
We're on a Mission to Fight Software Complexity Join us! →