RoleModel Software Home    Services    People    Process    Tools    More About Us    

Lazy Optimization

Ken Auer, President, RoleModel Software, Inc.
Kent Beck, First Class Software, Inc.


We both preach to our clients about performance tuning and productive programming. "Make it run, make it right, make it fast," is the mantra. As much as they seem to like the idea, in reality it makes them very nervous.

One client in particular had gotten a full dose of this medicine, and was moving ahead nicely on a complex application that stored Smalltalk objects in a relational database. They had gotten it to work, and then made sure the design for the overall framework was very elegant and flexible. Unfortunately, the system was not nearly as responsive as the user wanted. It was time to "make it fast" and time was running out.

Later, a developer pulled Kent aside and said, "You got me in trouble."

"About what?"

"You said I could tune performance after I got the design right."

"Yes, I did. What happened?"

"We spent all that time getting the design right. Then the users wanted sub-second response time. The system was taking as long as fifteen seconds to respond to a query. It took me all weekend to change the design around to fix the performance problem."

"How fast is it now?"

"Well, now it comes back in a third of a second."

"So, what’s the problem?"

"I was really nervous that I wouldn’t be able to fix the performance problem."

And that’s why there is a pattern called "Performance Assessment". It probably won’t change anything about how you program, but it will keep you from getting nervous.

Software development is a delicate piece of chamber music. If any one instrument protrudes, the quality of the piece is ruined.

Although many performance problems stem from lack of a coherent design, many others stem from solutions which optimize readability or flexibility over resource usage. In many cases this is the right tradeoff, as the costs of a system over its lifetime are dominated by how well it communicates to other humans. When the program has to go fast, though, it just has to go fast.

Languages (like Smalltalk) provide power that allows us to manage complexity, hiding how much binary logic is actually being exercised behind the functionality we take for granted. This isolation from these details makes it easier to write communicative but inefficient code.

Tuning in Smalltalk is the same as in any language. You either:

The primary forces driving the production of efficient programs are:

Below is a partial pattern language for creating efficient Smalltalk programs. Many of the higher level patterns are not specific to Smalltalk at all, but could be applied to all programs. Many of the more detailed patterns could also be applied to most languages, and most definitely object-oriented languages. However, in the interest of brevity and narrowing our audience, we chose to illustrate the patterns by assuming their application to Smalltalk. Additionally, as patterns get to lower levels of detail and move into the realms of "idioms" they tend to be less applicable to a wide range of languages. The authors have a high level of confidence in these patterns when applied to Smalltalk. We invite experts in other programming environments to use this as a template for a similar pattern language for their own environments and would be surprised if they did not find their own unique language specific patterns (or idioms).

 

Here is a graphical map of the patterns in this language. Patterns referenced but not included are in gray.

Performance Assessment

Prolog

You are ready to begin developing the Use Cases in your Commitment Schedule in earnest.

Problem

How can you develop with the confidence that performance won’t be a problem?

Forces

Early in program development is where you are most likely to hear the siren call of premature optimization. Once you are into the thick of development, you will be unlikely to have energy or thoughts for anything but getting the system done and done right. Unfortunately, it is early in development where premature optimization has the opportunity to do the most damage.

Early in development is when you should be listening carefully to the users needs and taking stock of your available resources. If every thought is poisoned by "Yes, but how fast will that run," you will likely miss the chance to make large leaps in development.

At the same time, performance is a legitimate concern. If the program doesn’t go fast enough, you will not if fact get paid (or keep getting paid). You need to assure yourself and your client that performance problems will not sink an otherwise brilliantly crafted and executed system. Therefore:

Solution

Indulge your performance worries by writing a short (e.g., two page) performance assessment of the system. Identify critical resources outside of Smalltalk. Do the math to figure out how big and how fast the final system will be. Prototype areas of concern if absolutely necessary.

Discussion

Some people call these "envelope calculations," after their tendency to be done on the back of an envelope. A typical performance assessment might conclude that only one mainframe access is allowable per user transaction.

Hard real time systems have a long history of careful performance assessments. When you know you have to service an interrupt in 10 microseconds, well, that pretty much shapes at least one major subsystem. Most systems, although often called real time, are not nearly so restrictive.

Areas that may require a prototype might be any unique area of your system that you deem somewhat complex and whose responsiveness is critical to the success of your application. For example, you may have a map that needs to display hundreds of small labels and icons representing particular locations. Throw together a prototype that pumps hundreds of random labels and icons to the screen in random spots. If it’s too slow, figure out what’s taking all the time and tune the prototype. When you are convinced it can be done (or can’t), throw out the prototype and start the real work (or revise your expectations)

Epilog

The Performance Assessment will help you decide how to shape the Concise Design. You may be able to define your Performance Criteria directly from the assessment.

Lazy Optimization

Prolog

Early Program quickly creates systems with the needed functionality. Concise Design refactors objects to improve clarity and flexibility.

Problem

How do you achieve acceptable performance with the least short and long term cost?

Forces

Users should have the experience of being in control of the computer, not the other way around. Even the best designed User Interface will frustrate and hinder a user if it does not respond with dispatch.

Some programs spend considerable time in Smalltalk. These programs will be efficient to the degree they leverage but do not stress Smalltalk resources such as message sending and the garbage collector.

Other programs are limited by resources out of the direct control of the programming language- virtual memory, database access, graphics calls, etc. Another standard of efficiency is how effectively a program uses the resources, how short the path is from a stimulus from the outside world to a stimulus for the outside world.

Almost every professional programmer has had the experience of writing a program, taking heat because it is too slow, and (importantly) not being able to fix the performance problems in a timely fashion. Some programmers react to this rite of passage by insisting that performance is a legitimate concern throughout development. No request for functionality or proposal for design refinement can be considered until it has past the "can it run fast" test.

This practice is the death of effective development. Progress slows to a crawl as every thought is forced to pass through the sieve of fear. Decisions are optimized for presumed performance impact rather than short or long term programming productivity and low risk. The result is large programs with limited functionality. They grow progressively more and more complex until they cease being valuable to the developing organization.

The irony is that such programs are virtually impossible to keep running fast. Their lack of clarity robs you of the ability to identify performance bottlenecks. Even if you could find a problem, chances are you could not address it without making widespread changes, exactly the sort of activity you must avoid late in development.

So, you have to worry about performance or your users won’t be satisfied, but it doesn’t help to worry about it all the time. Therefore:

Solution

Ignore efficiency through most of the development cycle. Tune performance once the program is running correctly and the design reflects your best understanding of how the code should be structured. The needed changes will be limited in scope or will illuminate opportunities for better design.

Discussion

It is critical that this strategy is incorporated into your project plan. Plan on some optimization activities toward the end or at natural breaks, when functionality of a section is expected to be complete.

Not only will this strategy give you actual time to do the necessary optimization, but also it will appease early viewers of your program who deem it too slow. You can always say, "yes we know it is too slow. You can see where, in our project plan, we plan on addressing this issue." Of course, if you don’t give yourself enough time to actually do the tuning, your still going to look bad. One rule of thumb is to assume 35% of the development time will be required for tuning for those inexperienced at the process of tuning in Smalltalk, decreasing to 10% for experienced developers.

Epilog

Agree with the client on the Performance Criteria by which the program will be measured. Create a Performance Measurement that automatically runs the part of the program judged by users to be too slow.

Performance Criteria

Prolog

The Performance Assessment may have discovered easily quantifiable criteria. Later, reviewers of your system working after your Concise Design may have identified some concerns.

Problem

How do you know what is worth tuning and what is not?

Forces

The naive view is that the whole program should be as fast as possible. This leads you to either commit to less functionality than you should implement or muddle the design so badly you can’t get anything to work right, much less work fast.

Many parts of your system will work without any complaints about performance even if they are implemented the simplest possible way. In addition, you cannot be sure in advance where performance bottlenecks will lie[Barry91]. You can’t even be sure which bottlenecks will bother users and which won’t. Therefore:

Solution

Ask your client to give you ballpark response time or throughput criteria for your system. Be prepared to revise this list as the system matures.

Discussion

At first you just want to know what the client is sure to notice. Typical performance criteria are:
  • Interactive response time
  • Frame rate
  • Bits per second
  • Transactions per second

As time goes on, your client (or you) may notice new things that just are not snappy enough. Make sure the particular complaints are specific and goals are explicitly stated.

Epilog

Performance Criteria need a Threshold to give you a concrete goal. Each Criteria should be accompanied by a set of Performance Measurements.

Threshold Switch

Prolog

You have identified the performance criteria for your system. You have begun to improve the problem areas.

Problem

How fast is fast enough?

Forces

You might think that once you have settled on a Performance Criterion, your job is to make it as fast as possible. This is almost never true.

Programs that are to be used by people are both constrained by and can take advantage of the quirks of human perception. In particular, actual performance and perceived performance are not linearly related.

Once you get past a perceptual threshold, substantial further measured performance improvement will result in little or no perceived improvement. Therefore:

Solution

Agree that as soon as you meet the Performance Criteria, you will stop tuning. If you’ve beaten the pants of of it (e.g., needed sub-second, now performs in 0.2 seconds) with the first Experiment, great. However, don’t make new goals of seeing how much you can beat the Criteria. When you hit the threshold of what is needed, stop tuning!

Discussion

Always remember that the major cost of software is in maintenance. Most techniques used to improve performance sacrifice a bit of maintainability. Avoid them at all costs.

Epilog

Once you’ve agreed to the concept of a Threshold Switch, you are ready to begin the actual process of tuning. However, before you do, you must gather Performance Measurements to make the Switches real.

Performance Measurement

Prolog

You’ve established Performance Criteria and agreed to Threshold Switches

Problem

How do you consistently measure performance and ensure that the system keeps running while you are tuning and know when you have triggered your Threshold Switches?

Forces

As already noted, programmers typically aren't very good at identifying bottlenecks. Experienced programmers (those who have made more mistakes than you have ever dreamed of making) may be able to make a good guess at what's causing these bottlenecks. If you've got one of these people around, getting them to guess would be a great improvement over guessing yourself.

If you have a good list of efficiency hacks, e.g. "when you see A, do B instead", you could look for all occurences of the "slow" ways of doing things (e.g. As) and replace them with their faster counterparts (Bs). Odds are good you'll end up with a much faster program, but it will take an awful long time to find all of the potentially slow things you've done. Also, there may be so many places you have to change, you might wonder why you didn't just write your code like that in the first place. Then you'll look at the probably unreadable result and remember why. Chances are you will have replaced a lot of code that ran "fast enough" if you just left it alone.

Even if you've done this exhaustive search and destroy, you may have missed some bottlenecks that were not in your catalog of efficiency tricks. You might even have missed the one or two key bottlenecks that, if corrected first, would have made all the other efficiency hacks unnecessary. Therefore:

Solution

Always write a routine that tests and times the apparent performance problems in such a way that it does not rely on human intervention. Use a profiler to identify the biggest bottlenecks in the area in question.

Discussion

A VisualWorks customer was very concerned about the performance of their system. They were concerned that all the horror stories they heard about Smalltalk being too slow for real systems were true. It seemed that the fundamental section of their system was taking minutes to do anything, and they needed it to take seconds. They couldn’t figure out what was taking so long no matter how hard they tried.

In a last ditch effort to figure out whether their work could be salvaged, Dave Leibs, Smalltalk performance guru from ParcPlace Systems, Inc. was called in for a day of consulting. They quickly described an overview of what they were trying to do. Nothing stood out to Dave as anything that would bring Smalltalk to its knees. So, he said,

"Where did the Profiler indicate most of the time was being spent?"

"What’s the Profiler?"

So he spent the next 30 minutes or so explaining what the Profiler was and helping them install it.

He then ran the profiler on the section of code in question and found it was spending more than 98% of it’s time in Date>>printOn:. Dave thought for a second what might be so slow about Date>>printOn: and felt it should not cause so great a problem. So, he looked a little higher up the profile tree and examined the containing method.

He noticed the expression ‘Date today printString’ inside of a loop. So he chose an Experiment and pulled the expression out of the loop, storing the value in a temporary variable which he then used inside the loop. Ten minutes after running the Profiler for the first time, he ran the code again. This critical section of code that was central to the client’s system now took less than a second.

Dave then turned to his hosts and said, "So, what are we going to do for the rest of the day?"

Epilog

After addressing the bottleneck (see [Hot Spot]) run the Performance Measurement again to verify whether you actually improved performance, and whether you've brought it down to meet the Performance Criteria. If you have not continue to use [Hot Spot] until you've met the criteria or you have reached a point where you have to either give up or [ReThink Approach]. Before moving on to the next bottleneck, verify that it still exists, as well-factored methods that are tuned often perpetuate performance increases in other areas that rely on the method. You may have killed two or more birds with one stone.

Hot Spot

Prolog

You’ve identified several areas of your system that don’t meet your Performance Criteria

Problem

Where do actually begin tuning?

Forces

You could just list out the problem areas and assign them as tasks on a queue and knock them off one at a time. Individuals who are working on your project could just take the next problem on the list . The order in which they are addressed wouldn't really matter if you insist all of them must be addressed, but you should probably prioritize them as some might be nastier than others and you don't want to spend the time fixing the simple ones if the critical ones are the real show stoppers. If individuals finished their task before others, they could just take the next available task. Pretty simple, isn't it?

However, dividing the work this way takes away the collaborative effort of individuals and the benefit of finding commonality of problems.

On the other hand, if you have everybody standing around analyzing the best way to go about solving the problems, they won't ever get addressed.

Solution

Profile most or all performance problems before putting a lot of energy into prioritizing, assigning, and fixing them. Identify the apparent places (at a more micro level) where the majority of time is going in each. These are the Hot Spots on which you should focus your tuning efforts. You may be surprised to find that multiple performance problem areas are due to a particular approach that spans multiple areas. The best approach to address these hot spots may be very different when taking into account several contexts in which the problems occur, and priority items will become more apparent. Once these are addressed, if performance problems persist, repeat this pattern to identify the next set of hot spots.

Discussion

If you are working on a large system, you may want to first categorize problems and run the profiler on a particular category of problems. You want to avoid waiting until everyone is waiting to tune to start your profiling efforts. Again find natural breaking points in the project (see [Lazy Optimization]) where a developer is available to work on areas where Concise Design is completed.

Epilog

In addressing the Hot Spots, look through the remaining patterns to see whether the corresponding code matches the symptoms, and apply the suggested solutions.

Experiment

Prolog

You have identified an apparent Hot Spot but are not certain you have the solution that will get you to your Threshold Switch.

Problem

How do I find the best performance solution?

Forces

The mindset at this point in time is "I need to make this thing faster or I’m dead meat".

However, we must remember the points made in Threshold Switch with respect to efficiency hacks sacrificing later maintainability. Therefore:

Solution

Treat each application of an efficiency hack as an experiment. If it didn’t get you to the desired Criteria (e.g., you needed 10x improvement and it gave you 0.1x), make a note of what it did give you and undo it. Keep only the experiments that actually get you to the criteria. You may have to combine several to get you there, but concentrate on the big bang experiments vs. the little improvements.

Discussion

In the discussion of Performance Measurement we presented a problem which may have indicated that we needed to tune Date>>printOn:. Suppose Dave Liebs took some time ripping apart the Date>>printOn: method and putting it back together so it was 40% faster, albeit less concise and in danger of blowing up in the fourth leap year in the 21st century. He then conducted the Experiment (Caching Temporary Variable) in the higher level method that gave him the big bang. Now leaving the faster printOn: method would have the critical section run in 0.92 seconds instead of 0.93.

Not only is the new printOn: method not worth leaving it in now for performance reasons, but also think about future ramifications of leaving it in. Some poor maintenance programmer comes along with the thankless job of upgrading the system to the next release of VisualWorks. He first tries to find out all of the base image changes his company has made that need to be carried forward. He discovers this change (which may or may not have a comment explaining what was done to it). He also notices that the new version of VisualWorks has a different version of the method than the previous version (before the hacked change). People will start noticing the amount of time trying to figure out what to do here more than they ever noticed the 0.01 seconds it bought them in the first release of the product.

Epilog

You should be getting closer to your desired Performance Criteria with little impact to your Concise Design.

Cacheable Expression

Prolog

The Profiler has identified a particular expression as a significant contributor to execution time. An expression is executed many times, often with the same result.

Problem

How do we reduce the overhead of this repeated expression with the least impact on the integrity of the system?

Forces

As in previous patterns we would rather not compromise the design of the system in removing bottlenecks. We would also prefer to minimize the overhead of objects. Additional variables (e.g. instance variables) often proliferates side effects of additional supporting methods, not to mention the memory allocated to hold the corresponding value. A variable cache eliminates design possibilities and offers opportunities to break encapsulation.

On the other hand, the result of an expression takes up memory somewhere (e.g. the execution stack) no matter what. We shouldn’t get too carried away with the avoidance of memory usage.

Solution

If you find an expression that often returns the same result, find the lowest overhead type of variable that will alleviate the redundancy. This will cause a performance improvement while minimizing the design compromises[Auer95].

Discussion

Several years ago, a major portion of a Process Modelling System with which we were involved was devoted to re-routing and re-drawing diagrams.

After a redesign of a prototype (Concise Design), performance had improved significantly (3-5x) without performance being a primary goal. However, this performance was still deemed unacceptable. Performance Criteria were established.

In a period of one week, system response time was improved roughly one order of magnitude.[White91] Due to the Concise Design there was little to overhaul. Most of these performance improvements were due to finding common expressions and speeding them up by Caching Expressions.

On the other hand, keeping may force other methods to be aware of caches. For example if a Rectangle caches its area in an instance variable, all methods that allow changes to the Rectangle’s coordinates must do something to make sure the cached area is reset.

Caching variables can also threaten the integrity of the object containing the cache itself. For example, before the area cached instance variable is added, the following expression (although bad practice) would not threaten the integrity of the object:

 

  • aRectangle origin y: 25
  • The implementor of Rectangle chose to take the risk of exposing it’s instance variable for a variety of reasons knowing that, as the implementation stood, such an expression as this would not break anything.

    As soon as the Cached Instance Variable is added, such an expression would result in a problem as there would be no way to trigger a cache reset without additional mechanisms.

    Epilog

    Overhead and risk typically are lowest for [Caching Temporary Variable], followed by [Caching Argument], [Caching Class Variable], [Caching Class Instance Variable], [Caching Instance Variable]

    Caching Temporary Variable

    Prolog

    A Cacheable Expression is causing a Hot Spot, most often because it occurs in a loop.

    Problem

    How do you increase the performance of the loop?

    Forces

    Sacrificing good design will cost you during maintenance. Introducing a cache introduces the possibility that its value will become incorrect during the life of the method.

    Often seemingly harmless expressions inside a loop return the same result each time

    Solution

    For each expensive expression in the loop that returns the same value each time through, add a temporary variable which is set to the result of the expression before the loop begins and replace all occurences of the expression with the corresponding temporary variable.

    Discussion

    For example, if you wanted to collect a subset of a table based on selected column names, your code may look something like:

     

  • collectColumns: anArrayOfColumnNames
    ^self rows collect:
    [:row |
    anArrayOfColumnNames collect: [:name | row at: (self columnIndexFor: name)]]
    columnIndexFor: aColumnName
    ^self columnNameMaps at: aColumnName
  • Realizing the column indices of interest are going to be the same each time through the loop, you could greatly reduce the amount of times #columnIndexFor: is sent as follows:

     

  • collectColumns: anArrayOfColumnNames
    | columnIndices |
    columnIndices := anArrayOfColumnNames collect: [:name | self columnIndexFor: name].
    ^self rows collect:
    [:row |
    columnIndices collect: [:index | row at: index]]
  • Epilog

    Although this may fix many problems, it is possible that the expression might still be repeated many times due to pervasiveness of a containing method and performance may still not meet the Performance Criteria. If so, look for caching with a larger scope, such as [Caching Instance Variable].

    Caching Argument

    Prolog

    Multiple objects are creating similar transient objects to create pieces of the desired end result.

    Problem

    How can we reduce the number of transient objects?

    Forces

    Keeping "collector" object in instance, class, class instance, or global variable (depending on scope) has undesirable costs. Although it does increase performance, it also adds a lot of overhead and tends to hold onto objects that are not needed past the completion of an operation. Therefore:

    Solution

    Keep a single transient object that can be used by multiple objects which contribute to the final result and pass the object around as an argument to all the objects that can put it to use.

    Discussion

    Here is an example of an algorithm that uses collection concatenation, which causes lots of transient objects to be created.

     

  • printList
    | list |
    self isEmpty ifTrue: [^String new].
    list := self first printString.
    2 to: self size do:
    [:index |
    list := list, ', ', (self at: index) printString].
    ^list
  • Objects convert themselves into strings by creating a Stream on an empty string and sending themselves the printOn: message with the Stream as an argument. We could instead create a single Stream which all objects would write to.

     

  • printList
    | stream |
    stream := WriteStream on: String new.
    self do:
    [:each |
    each printOn: stream.
    stream nextPutAll: ', '].
    self isEmpty ifFalse: [stream skip: -2].
    ^stream contents
  • Another example would be to pass a collection into which all objects which had something to contribute would add to.

     

  • allFemaleChildren := Set new.
    self people do: [:each | each addFemaleChildrenInto: allFemaleChildren]
  • The addFemaleChildrenInto: method would do nothing but add to the passed collection, or do nothing if there are no female children. It could be used instead of:

     

  • allFemaleChildren := Set new.
    self people do: [:each | allFemaleChildren addAll: each allFemaleChildren]
  • which would create an intermediate collection, maybe even an empty one.

    Epilog

    Applying this pattern may result in modifications to multiple parts of the system if protocol must be changed to use the Caching Argument. Typically, this can be avoided by adopting the old protocol to still be supported using the new protocol with the Caching Argument as it’s foundation. However, this may potential hurt the performance of systems using the original protocol. Use Performance Measurement and Performance Criteria to identify if this is a problem.

    Caching State Variable

    Prolog

    A Cacheable Expression is causing a Hot Spot, but it cannot simply be cached in an argument or temporary variables since the contexts in which it is necessary are varied.

    Problem

    How do you increase the performance of expressions that are needed in a variety of contexts?

    Forces

    You never want to limit design possibilities. No one can tell what new requirements will come later and you would like to easily map these requirements into simple methods in existing classes. Every time you add a state variable, a burden is placed on the developer to make sure every method that may have an effect on that state is aware of it. It can no longer be concerned with simply, giving an answer. It needs to be concerned with side-effects.

    However, sometimes, ignoring the implied side-effects of an operation is exactly what causes performance problems. Everything must be calculated dynamically because "you never can be sure" whether the environment has changed since the last time the result of the expression was needed.

    In the context of a single method, or a chain of methods, knowledge of a local environment (temps, arguments, etc.) can be exploited via Caching Temporary Variables or Caching Arguments. The decisions to use these types of Cacheable Expressions affect no one else but the execution stack which is pretty well secure from the rest of the system. Unfortunately, the system encompasses much more than this local environment and such techniques can not solve all performance problems.

    Yet, we still want to avoid caching expressions at a level that exposes the cache to the entire system, and therefore limits design possibilies and exposes integrity problems on a wide scale. For instance, if we kept the results of an expression for anyone to use in a global variable all methods that wanted to benefit from the results of the cache would have to know about its existence by name. Additionally, any actions anywhere in the system that would invalidate its value would have to explicitly reset it.

    You would like to encapsulate the Cacheable Expression in as low a level as possible that will affect all of the contexts which would like to exploit the cache. Classes, MetaClasses, and Class hierarchies each define an environment narrower than the system level but broader than the execution stack. Each define their own behavioral environment. Therefore:

    Solution

    Identify the behavioral environment (Class, Class hierarchy, or MetaClass) that encapsulates all of the contexts in which the Cached Expression can be shared before the probability of resetting the value would be expected. Add a corresponding state variable (instance, class, or class instance). Give it the name of the message to be cached. Rename the message to be cached by prepending "compute" to the corresponding selector. Use lazy initialization to compute the value of the variable. Modify other parts of the behavioral environment to make sure this value is reset whenever some event occurs that could possibly make this result obsolete.

    Discussion

    If you needed a solid dot icon, you could recompute it every time.

     

  • dot
    ^"build the icon"
  • Realizing the icon will be the same for all instances of all classes in the hierarchy:

     

  • dot
    Dot == nil
    ifTrue: [Dot := self computeDot].
    ^Dot
  • If the dot size needed to change based on user preferences changing, a hook would have to be inserted to flush the Dot whenever the user preferences changed.

    Similarly, an instance method

     

  • fullName
    ^last, ', ', first, ' ', middle
  • could become

     

  • computeFullName
    ^last, ', ', first, ' ', middle
    fullName
    fullName == nil
    ifTrue: [fullName := self computeFullName].
    ^fullName
  • and methods that set the parts of the fullName would be modified to reset the fullName as the example below shows:

     

  • last: aString
    last := aString.
    self fullName: nil.
  • If you needed an icon to represent the on state for a particular class of button, but the icons were different for different classes of buttons in the hierarchy, you could implement:

     

  • onIcon
    ^"build the icon"
  • Realizing the icon will be the same for all instances of all classes in the hierarchy:

     

    • onIcon
      ^self class onIcon

    class>>onIcon
    onIcon == nil
    ifTrue: [onIcon := "build the icon"].
    ^onIcon

    Clearing the cache of class instance variables may require some awkward code in the root class to ensure that all caches are cleared simultaneously:

     

  • class>>resetClassCache
    self withAllSubclassesDo: [:each | each cacheVar: nil].
  • Epilog

    With any Cached State Variable, the potential is there to eliminate design possibilities or invalidate the previous class design assumptions. Be careful to adjust your Concise Design if the implications warrant.

    Simplification

    Prolog

    You notice several methods are Hot Spots in a certain context that cannot be simplified while still maintaining the integrity of the system for all instances of the same class.

    Problem

    How do we improve the Hot Spots for the context in question without causing undesirable side-effects in other contexts where instances of the same class is used?

    Forces

    Changing a method in such a way as to enable unadvertised side-effects under certain conditions, no matter how remote, is inviting trouble.

    Case statements could be inserted to detect a particular context. However, such statements are ugly and often slow down execution for the more general cases due to the overhead of case checking. Therefore:

    Solution

    Examine the features of a class that are actually being used by a particular instance or group of instances. Determine if there is another existing class or a simplified version of the class being used which could provide the desired features more efficiently.

    Discussion

    There are many examples of existing classes that act functionally similarly for many operations, yet offer distinct performance advantages.

    Arrays are generally more efficient than OrderedCollections. They take up less space, don’t have extra instance variables to manager to denote the beginning and end of collection, and avoid a level of indirection in accessing their indexed instance variables.

    Symbols, once created, are much faster to compare than Strings as they are merely comparing object pointers as opposed to comparing each individual element contained by the String.

    Additionally, one can often create classes that are simplifications of more flexible classes if that flexibility is not needed. For example, ComposedText in VisualWorks will handle various composition features of text (word-wrapping, tabs, changes of fonts, etc.). If you simply want to display a simple ASCII String in a single font with no special characters or composition considerations, one could create a simple class, such as VisualString which simply calls the primitive for displaying a string using the particular font without any intermediate effort.

    Epilog

    Often after Simplification you may find that other areas of your system are using more complex classes than they need to be unnecessarily. You may wish to revisit your Concise Design to delete classes that are redundant.

    Transient Reduction

    Prolog

    You have some set of objects, A, and want some set of objects, B.

    Problem

    How do I get from A to B in the most efficient manner?

    Forces

    Objects are wonderful and they can do wonderful things for us very elegantly. Encapsulation helps us to hide the complexity of what's happening behind the scenes when we want some result. There may be tens, hundreds of thousands of objects that are created in helping to determine the desired result that we never even see. These objects give their life for our result and we don't even know they ever existed. Some thanks they get.

    Ah, but their dirty little secret is they don't like going unnoticed, so they leave their mark by gobbling up CPU cycles. Creation here, a few messages sent there, a little garbage collection to clean up the mess and eventually someone might notice that more time has gone by than they had planned.

    Solution

    Reduce the number of transient objects in a given operation.

    Discussion

    A client of ours once was complaining about Smalltalk and how the garbage collector was constantly interrupting and taking up too much time during many of their graphical programming operations. After analyzing the situation, we found that moving a particular object from one place to another was causing tens of thousands of other objects to get created that were all short-lived and mostly unnecessary. The problem wasn’t the efficiency of garbage collection, but the constant garbage creation.

    Epilog

    This can be accomplished using several other patterns, e.g. [Transform Objects], [Caching Argument], [Pre-allocated Collection], [Concatenating Stream]

    Object Transformation

    Prolog

    I have some set of objects, A, and I want some set of objects, B of the same classes. Once I have B, I no longer need A.

    Problem

    How do I get from A to B in the most efficient manner?

    Forces

    Overhead of creating new object, garbage collecting old one.

    Preserving encapsulation.

    Solution

    Instead of creating new objects to replace the old objects, transform the old objects into the objects you desire.

    Discussion

    For example: if you have a Rectangle of a particular shape and position, and you want to replace it with a new Rectangle with a similar shape, but different position, you could simple send it the message translatedBy: with the appropriate argument to get the resulting Rectangle. However, this creates a new Rectangle and two new Points causing the old Rectangle and its two points to be garbage collected. Instead, you could send the message moveBy:

     

  • moveBy: aPoint
    origin := origin + aPoint.
    corner := corner + aPoint.
  • which results in the old Rectangle being transformed to the new, creating one less object and adding one less object to the garbage heap. If one really wanted to throw caution to the wind, and was sure the original Rectangle was the only object referring to the Points held by its origin and corner variables. One could write a method (and send the corresponding message) that reused the same Points and therefore created no new objects nor added any objects to the garbage heap. E.g.

     

  • transformBy: aPoint
    origin x: origin x + aPoint x.
    origin y: origin y + aPoint y.
    corner x: corner x + aPoint x.
    corner y: corner y + aPoint y
  • Epilog

     

    Hypoth-a-sized Collection

    Prolog

    A Hot Spot is time spent adding objects to a collection.

    Problem

    How can I make this more efficient?

    Forces

    Collections are designed such that the details of how they manage to hold all of the objects one is placing in them is hidden. Most collections start out with relatively few number of slots to place objects in as to avoid wasting space. When it needs more space, it simply allocates a bigger chunk of space, copies the objects previously held to the new space, and no one is the wiser... except the profiler.

    When adding many objects to a relatively small collection, much of the time is spent growing incrementally larger buffers and copying objects from the old to the new buffer.

    Solution

    When you realize an operation is bound to add a significant number of objects to a collection, make sure the collection is big enough, based on a hypothesis, to hold the results without growing, or at least with minimal growing.

    Discussion

    E.g. instead of something like:

     

  • nonBaseClasses
    | collection |
    collection := Set new.
    Smalltalk allClassesDo:
    [:each |
    each superclass == nil
    ifFalse: [collection add: each]].

    ^collection
  • allocate enough space in the collection to avoid growth:

     

  • nonBaseClasses
    | collection |
    collection := Set new: Smalltalk classNames size.
    Smalltalk allClassesDo:
    [:each |
    each superclass == nil
    ifFalse: [collection add: each]].
    ^collection
  • Epilog

     

    Concatenating Stream

    Prolog

    You may be collecting results in a Collecting Temporary Variable. You may be collecting results over several methods in a Collecting Parameter.

    Problem

    How do you concatenate several Collections efficiently?

    Forces

    Concatenation (,) is a simple way to join several Collections. When you have lots of Collections, though, it can be slow because the objects in the first Collection are copied once for each Collection that is concatenated.

    Streams provide a way out of this dilemma. Streams are careful not to copy their contents many times.

    Solution

    Use a Stream to concatenate many Collections.

    Discussion

    We ran a quick benchmark to see how significant the difference was between Collection concatenation and Streams. We ran the following code, which concatenates one thousand Strings:

     

  • 100 timesRepeat:
    [result := String new.
    1000 timesRepeat: [result := result , 'abcdefg']]
  • It took 53 seconds to run. Then we ran the Stream version:

     

  • 100 timesRepeat:
    [writer := WriteStream on: String new.
    1000 timesRepeat: [writer nextPutAll: 'abcdefg'].
    writer contents]]
  • It took 0.9 seconds to run. The difference is a factor of almost 60.

    Sometimes you will use Concatenating Stream not because it is your choice, but because you have to fit into other code that already uses it. The most common example is specializing object printing by overriding printOn:.

    Note that this pattern can also be combined with Hypoth-a-sized Collection to create a Stream whose collection can avoid growing.

    Epilog

     

    Ken Auer <[email protected]> 

    Go Back

    Copyright © 2002-03 by RoleModel Software, Inc. All rights reserved.