Optimizing Smalltalk code

Smalltalk User Guide : Packaging, unloading, and analyzing code : Analyzing the runtime performance of code : Analyzing code using the Benchmark Workshop : Optimizing Smalltalk code

Almost every project requires an optimization phase. This is especially true in the dynamic and highly productive world of Smalltalk. Often, ideas are rapidly prototyped and working code is hastily produced with little regard for efficiency. Even in production-quality code or finished subsystems, new features are added that can subtly affect performance.

Project managers and programming teams can leverage traditional Smalltalk productivity and produce high-quality, optimized code by the following actions:

•Scheduling time for optimization into every project

•Coding for functionality, clarity, and maintainability

•Writing and running test cases and benchmarks

•Finding the "right 10 percent" of the code to optimize

•Applying the rules of Smalltalk optimization (see The rules for optimizing Smalltalk code)

If every project requires an optimization phase, time needs to be allocated for optimization. In new projects this usually happens toward the end of the cycle when a team discovers how their code is being used.

When you are developing code, you need to focus on functionality. When you are prototyping, care is needed to ensure that prototype-quality code does not masquerade as production-quality code. Prototype code is notoriously slow, obtuse, and unmaintainable. This does not mean that prototyping is a bad approach for exploring new ideas; it is another tool to get the job done. You need to recognize prototype code and ensure that it is discarded, cleaned up, or rewritten before it enters the production system. To make significant performance gains, it is almost always necessary to have a deep understanding of the code. Poor-quality code defies optimization.

As part of writing the code, programmers usually write test cases, although sometimes other members of the team are assigned this task. Using the Stats tool, test cases can easily be turned into benchmarks and these benchmarks used to find "the right 10 percent".

Finally, the rules for optimizing Smalltalk code should be followed.

The rules for optimizing Smalltalk code

The rules for optimizing Smalltalk code are:

•Run less code.

•Send fewer messages.

•Avoid expensive operations.

•Run your code, not the garbage collector.

Run less code

Here's what you can do to run less code.

Scaling and complexity

Running less code means understanding scaling and complexity. An O(n) operation scales much better than an O(n^2) operation, but both perform poorly for large values of n.

It is also necessary to understand the minimum and fixed costs of an operation. Sometimes it is better to have a lower minimum cost for common cases in exchange for more complexity. This means that the operation does not scale well, but scaling may not be necessary. For example, a Dictionary does not scale well to huge databases, but it is faster than a BTree for smaller amounts of data.

Understanding complexity means also understanding the complexity of base classes. Arrays are very fast, OrderedCollections are somewhat slower, and Dictionaries and Sets are very slow by comparison. Seemingly unassuming methods such as asSet can be very expensive. Turning an arbitrary collection into a set is an O(n^2) operation in the worst case. For this reason, unnecessary conversion between collection types should be avoided.

The Smalltalk virtual machine is optimized for SmallIntegers. Avoiding LargeIntegers, Floats, and Fractions makes mathematical computations faster.

Running less code means designing for the most common cases. When testing a set of conditions, test the most common condition first. Do not perform expensive computations or lookup operations when the user expects realtime behavior, such as when drawing in a window. By precomputing and caching the required information, less code runs during the expensive operation.

Finally, running less code means improving the algorithm; a better algorithm achieves the same results and runs fewer operations. If the code is a little too slow, basic optimization techniques help. If the code is a great deal too slow, a new architecture, algorithm, or some strategically placed user primitives may be necessary. Performance tuning usually gets a 10-30% improvement unless a tight loop is optimized, in which case the improvement might be from two to five times. Changes in architecture, algorithm, and representation hold the promise of greater wins. Many fast applications have been written without resorting to performance tricks, simply by devising a clever algorithm or representation for the data.

Basic optimization

Occasionally, basic optimization techniques contravene good fundamental design principles. For example, coding a message-send operation inline can improve performance, but puts a copy of the code elsewhere in the system. If the original code is optimized or a bug in it is fixed, the copy is not updated. Sometimes it is necessary to balance optimization and design. In the "10% of the code" that gets optimized, compromise may be necessary.

The following guidelines are useful when writing and optimizing code:

•Investigate activity in loops. Computing loop invariants outside the loop reduces the number of computations inside the loop. In a tight loop, every message-send operation is expensive. See Send fewer messages.

•Just send the message. Avoid perform: and Blocks. The use of respondsTo: and isKindOf: often indicate a problem in the architecture of a system.

•Use in line message-send operations in time-critical methods.

•Use lazy initialization to avoid expensive computations that may not be necessary.

The following are known causes of performance problems:

•Using object-lookup operations excessively.

Objects are looked up in Smalltalk far more often than in other languages, due to the availability of base classes, such as Dictionaries with lookup operations. To reduce the number of lookup operations, design an object structure that gives direct access to the required object through a pointer chain of instance variables (as is done in Pascal or C). This gives a small fixed cost to the operation. For a badly hashed object, a Dictionary lookup operation is O(n) in the worst case.

•Using code that fails and retries an operation.

The following example shows how to avoid failing and retrying Integer>>#+:

"Slower. Fails the Integer>>#+ and retrys."

^1 + (2@3)

"Faster. Does not fail and calls Point>>#+."

^(2@3) + 1

"Fastest. Does not fail and does not call Point>>+."

^(2 + 1) @ (3 + 1).

•Sending the same message to the same object multiple times in the same method when the result does not change.

•Creating Symbols dynamically from Strings.

Because Symbols are guaranteed to be unique, each potentially new one must be looked up in the Smalltalk symbol table. If it is not found, it is added to the table.

•Using Dependents.

Dependents are implemented using Dictionaries, which use lookup operations.

•Using | or & in Boolean expressions.

Instead, use and: or or:.

•Using lazy initialization for values that are always needed.

If a value is always needed, lazy initialization of it slows down each access to it due to the overhead of the lazy check. Inappropriate lazy initialization makes using inline code difficult and error-prone.

•Using lazy initialization for values that are not expensive to compute.

For example, lazy-initializing a Boolean to true is slower than initializing it when the instance is created. This is because assigning the default value is very fast compared to the overhead of a lazy check each time the variable is accessed.

Send fewer messages

Sending fewer messages means that Smalltalk can stay executing longer inside a method. But how can you send fewer messages when everything in Smalltalk is a message-send operation?

The Smalltalk virtual machine performs special processing at run time depending on the selector and the class of the argument. Some message-send operations can be resolved inside the virtual machine without executing a method. Using these selectors can make a huge difference in a tight loop. Other selectors (particularly control and looping selectors) are optimized out by the compiler.

The following selectors are normally resolved for low-level classes, such as SmallInteger, or are highly optimized by the compiler and virtual machine:

•+, -, <, <=, >, >=, =, ~=, //, \\, *, /, between:and:

•==, ~~, isNil, notNil, and:, or:, not, yourself

•ifTrue:, ifFalse:, ifTrue:ifFalse:, ifFalse:ifTrue:

•whileTrue:, whileFalse:, timesRepeat:, to:do:

Virtual-machine optimizations can sometimes be defeated accidentally. For example, whileTrue: is treated specially when the receiver and both arguments are Blocks. The following two code fragments do the same work, but the second one is faster:

"Slow. A block is created and whileTrue: is sent."

| x block |

x := 1.

block := [x < 12].

block whileTrue: [x := x + 1].

"Fast. No block is created and the #whileTrue: is not sent."

| x block |

x := 1.

block := [x < 12].

[block value] whileTrue: x := x + 1].

In code that must be very fast, directly accessing instance variables improves performance. If the code is executed many times, directly accessing instance variables makes a huge difference because of the reduction in the number of message-send operations. Even though accessors are highly optimized by the virtual machine, they still must send the message.

In application code that reimplements doesNotUnderstand:, both the number of message-send operations and the time to process the original message-send are increased. This happens because the original message-send operation must first fail before doesNotUnderstand: runs. In doesNotUnderstand:, handlers that forward the original message, the overhead of the handler is incurred for each forwarded message.

Avoid expensive operations

Understanding which application program and system operations are expensive is key to writing fast applications. Using the highly optimized methods from the virtual machine and base class library improves the performance of Smalltalk code.

For example, String>>#indexOf: is a primitive in most images. It is much faster than writing a loop and testing the elements for equality or using another less-optimal method, such as findFirst:.

When in doubt about the speed of an operation, use the Stats tool to construct benchmarks and compare times.

Use CLDT methods instead of writing application program code to do the same work. CLDT has been highly optimized for most common operations.

Run your code--not the garbage collector

Even though garbage collectors are fast, they do use CPU power, which Smalltalk code could be using. It is better to create less garbage.

To find where the code is creating extra garbage, examine senders of new and new: in the code being optimized. Objects are also created by some primitives; for example, Float and LargeInteger operations (for example, +, -, /, and *).

Less well known is that the virtual machine creates objects to run certain methods. Methods that contain Blocks can result in object creation (for example, MethodContext and BlockContext). Exceptions to this rule are special control constructs that appear to use Blocks, such as whileTrue:, but in fact, often do not because the Blocks are often optimized out. In code that draws graphics, the point and rectangle operations are typical sources of excess garbage.

Last modified date: 05/19/2020