Support for double-byte characters
This section describes the classes that provide support for double-byte characters. This section is of particular interest to developers whose applications will be deployed in languages such as Japanese where the size of the alphabet exceeds 255 characters and thus requires the use of double-byte characters.
Characters
Characters are defined to be unique objects that represent code points. Single-byte characters are restricted to values in the range 0 to 255. Double-byte characters have values in the range 0 to 65535. The interpretation of character values is dependent on a coded character set. Several Character protocols (that is, the protocols concerned with collation, character classification, and case conversion) have been refined or extended to be consistent with National Language Support requirements.
String classes
The classes String and DBString are used to represent sequences of characters in VA Smalltalk. DBString represents a sequence of Characters that can take values between 0 and 65535 (that is, double-byte characters). Similarly, String represents a sequence of Characters that can take values between 0 and 255 (that is, single-byte characters). Both String and DBString support efficient character-based accessing and indexing operations. Common Widgets and Common File System subsystem protocols answer and accept instances of String or DBString as appropriate for the current locale and object being passed or returned.
VA Smalltalk does not support Symbols containing double-byte characters.
Sending the message asDBString converts a String or DBString into a DBString. Sending the message asSBString to an instance of String answers the receiver, because it cannot contain double-byte characters. Sending asSBString to an instance of DBString converts the receiver into a String if possible; however, this message fails if the receiver contains characters with values greater than 255, in which case nil is returned.
Attempting to store a Character with value greater than 255 into a String is an error. VA Smalltalk provides the Locale>>isDBLocale method to determine if a locale uses double-byte characters. The isDBLocale message answers a boolean value indicating whether or not the receiver locale uses double-byte characters.
A locale that supports double-byte characters should use DBString as its string class. When sent to an instance of Locale, the preferredStringClass message answers either String or DBString, whichever class is appropriate for the locale represented by the receiver. For example, if the current locale is #('japanese' 'japan'), then preferredStringClass answers DBString.
The preferredStringClass message is of particular use in creating streams used in application code. For example, the practice of writing
WriteStream on: String new.
results in an exception if a double-byte character (a Character with value greater than 255) is ever appended to this stream. The following code is portable, and uses the string class appropriate for the current locale.
WriteStream on: Locale current preferredStringClass
new.
Last modified date: 01/29/2015