Programmer Reference : UnicodeSupport : EsUnicodeEncoding
EsUnicodeEncoding
Description
Abstract container class whose subclasses implement unicode encoding standards.
Responsibility
Collection class for associated view
(#contents)
'smalltalk' asUnicodeString utf8 contents isKindOf: Utf8
Conversion to a UnicodeString
(#asUnicodeString, #asUnicodeString:)
(Utf32LE with: 16r1F600) asUnicodeString first name = 'GRINNING FACE'
Conversion to any other unicode encoding container class
(#asUtf8, #asUtf16, #asUtf16LE, #asUtf16BE, #asUtf32, #asUtf32LE, #asUtf32BE...)
(Utf32 with: 16r1F600) asUtf16BE asUtf32LE asUtf16LE asUtf8 asUtf32 = (Utf32 with: 16r1F600)
Validate content according to unicode encoding rules of the container class
(#isValid)
'a' asUnicodeString utf8 contents isValid.
(Utf8 with: 233) isInvalid
Container class that can be iterated in code unit slot sizes
(<Utf8> is byte class, <Utf16> is word class, <Utf32> is long class)
16r1F600 asUnicodeString utf16 contents size = 2
Examples
Convert utf8 -> utf16BE -> utf32LE -> utf8
| utf8 |
 
utf8 := 'Smalltalk' utf8 contents asUtf16BE asUtf32LE asUtf8.
self assert: [utf8 asUnicodeString asSBString = 'Smalltalk']
Detect validity of UTF-16LE
"Valid UTF-16LE"
self assert: [(Utf16LE with: 16r97) isValid].
 
"Invalid UTF-16LE (Surrogate range)"
self assert: [(Utf16LE with: 16rD800) isInvalid].
Repair and convert invalid UTF-32BE to a UnicodeString
| invalidUtf32BE repairedUniStr |
 
invalidUtf32BE := Utf32BE with: 16rD834.
self assert: [invalidUtf32BE isInvalid].
repairedUniStr := invalidUtf32BE asUnicodeString: true..
self assert: [repairedUniStr size = 1 and: [repairedUniStr unicodeScalars first = UnicodeScalar replacementCharacter]]
Class Methods
None
Instance Methods
asUnicodeString
   Answer the receiver as a <UnicodeString> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <UnicodeString>
   Raises:
    <Exception> receiver with invalid content cannot be converted to unicode string
asUnicodeString:
   Answer the receiver as a <UnicodeString> instance.
   If @repair is true, then repair invalid encodings such
   that a valid unicode string can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   @see method comments in subclass overrides for examples.

   Arguments:
    repair - <Boolean>   
   Answers:
    <UnicodeString>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf16
   Answer the receiver as a <Utf16> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf16>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf16
asUtf16:
   Answer the receiver as a <Utf16> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf16 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf16>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf16BE
   Answer the receiver as a <Utf16BE> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf16BE>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf16 big endian
asUtf16BE:
   Answer the receiver as a <Utf16BE> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf16 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf16BE>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf16LE
   Answer the receiver as a <Utf16LE> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf16LE>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf16 little endian
asUtf16LE:
   Answer the receiver as a <Utf16LE> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf16 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf16LE>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf32
   Answer the receiver as a <Utf32> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf32>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf32
asUtf32:
   Answer the receiver as a <Utf32> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf32 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf32>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf32BE
   Answer the receiver as a <Utf32BE> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf32BE>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf32 big endian
asUtf32BE:
   Answer the receiver as a <Utf32BE> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf32 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]
   
   Arguments:
    repair - <Boolean>
   Answers:
    <Utf32BE>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf32LE
   Answer the receiver as a <Utf32LE> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf32LE>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf32 little endian
asUtf32LE:
   Answer the receiver as a <Utf32LE> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf32 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf32LE>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
asUtf8
   Answer the receiver as a <Utf8> instance.
   Do not repair invalid sequences, but instead raise
   an exception.
   
   Answers:
    <Utf8>
   Raises:
    <Exception> receiver with invalid content cannot be converted to utf8
asUtf8:
   Answer the receiver as a <Utf8> instance.
   If @repair is true, then repair invalid encoded elements
   such that a valid utf8 can be created.
   
   Repairing typically involves detecting invalid sequences
   and replacing with the unicode replacement character
   [UnicodeScalar replacementCharacter]

   Arguments:
    repair - <Boolean>   
   Answers:
    <Utf8>
   Raises:
    <Exception> if @repair is false and receiver contains invalid contents
isInvalid
   Answer true if the content of the container is invalid according to
   the rules of the encoding.
   
   Answers:
    <Boolean> true if invalid, false if valid
isValid
   Answer true if the content of the container is valid according to
   the rules of the encoding.
   
   @see method comments in subclass overrides for examples.
   
   Answers:
    <Boolean> true if valid, false if invalid
Last modified date: 01/18/2023