Stabilizing the benchmark

Smalltalk User Guide : Packaging, unloading, and analyzing code : Analyzing the runtime performance of code : Analyzing code using the Benchmark Workshop : Stabilizing the benchmark

To observe speed improvements in code, make sure you establish a stable baseline time. Getting a stable execution time for Smalltalk code using Time millisecondsToRun: [...] is a difficult task because raw time can vary dramatically depending on the initial and current state of the virtual machine. For example, the garbage collector may scavenge more often or longer in one run than in another. This can give the appearance of improvement or degradation when no code has changed.

Factors that affect stability

To be unobtrusive, the Stats tool allocates the memory it needs before it runs and creates as little garbage as possible while it runs. Large allocation requests do not occur between runs. The Stats tool does not mask how the application uses memory.

The following menu items in the Bench menu are used to control the stability of raw execution time for a bench method:

Allow Interrupts

Lets asynchronous messages interrupt the execution of the bench method. Time spent in interrupt handlers is charged to the bench method.

To disallow asynchronous messages, uncheck this item. Be careful, though. When asynchronous messages are disabled, Smalltalk cannot be interrupted. This means that the programmer cannot break the execution of a bench method.

Empty New Space

Empties new space before each run.

The amount of available new space at the beginning of a run determines when a scavenge operation starts and how long it takes. When new space is emptied before each run, the raw time is stabilized because the number of scavenge operations and the time spent scavenging are more predictable for each run.

Compact Old Space

Makes Smalltalk perform global garbage collection before each run.

Very rarely, depending on the operation, Smalltalk must perform global garbage collection to satisfy an allocation request. When old space is compacted before each run, the raw time is stabilized because the number of globals and the time spent performing global garbage collection are more predictable for each run.

Global garbage collection can take a long time; it is not necessary when stabilizing most results.

Flush Method Cache

Clears the compiled-method cache before each run.

Smalltalk uses a compiled-method cache so that methods that are executed frequently are not looked up in the class each time. On virtual machines that support dynamic translation, translated methods are also cached. This means that the first run of a method can be slower than subsequent runs. Clearing the code cache before each run stabilizes raw time because all runs will include both the time spent translating and the time spent looking up methods.

Stability is important in a baseline. This means that run [R] benchmarks should always be assessed for stability. For consistency and repeatability of results, sampled [S] and traced [T] benchmarks should share the same initial virtual machine conditions as run R] benchmarks.

Factors beyond the control of the Stats tool

Other factors affecting the stability of a benchmark must be considered when establishing a baseline. The following factors are beyond the control of the Stats tool and may cause execution time to vary:

•Other OS processes and applications

•I/O operations (such as accessing the network or reading from a disk)

•Memory alignment and processor caching (especially using the 80x86 family)

•The OS memory manager (when committing and swapping pages)

•The OS API (the time in some OS calls can vary dramatically)

Last modified date: 01/29/2015