Jeff (moonlessnights) wrote in ood,

Beyond IMP-caching: Hiding Virtual Dispatch Cost on Repeated Operations

The cost of virtual method dispatch seems to be an especially problematic sticking point when it is used to access a simple implementation over and over. A familiar example is iterating a data set where the method to access the next element of a complex data model is frequently implemented via virtual dispatch (since iterator interfaces and mechanics are usually completely common with the exception of a single, usually very small, look-up method). For those of us familiar with Mac OS X, a good example of a pattern to resolve this is the NSFastEnumeration formal protocol.

The pattern underlying the protocol is a simple, scalable, and relatively elegant solution to the problem of a high virtual dispatch cost becoming a multiplier in the expense of a linear operation: instead of n invocations, each with 1 result, make 1 invocation with n results. From the point of view of the CPU, it merely shrinks the size of an inner-loop by removing a logical invariant (the virtual dispatch and function call).

The implementation of a solution based on this pattern is simple: allocate a large output buffer on the caller's stack and get the one virtual dispatch to populate as much of it as possible, instead of only returning one value. Then, in the caller, loop over the buffer and operate on the iterator's output without needing to call it for each step. An additional state structure is allocated in the caller's stack frame to allow the iterator to pick up where it left off (unless the situation allows the state to be preserved within the iterator instance).

While iteration is the obvious application of the pattern, it can apply to various other problem spaces (although it is most obviously beneficial when the cost of the virtual dispatch is a substantial part of the call expense). Other examples which come to mind are numerical sequences (if computing the next value in a sequence is simple, compute the next n to avoid n-1 invocations), simple data transformations (pass in both an input set and an output set - moves the loop inside the virtual dispatch instead of including it as part of the loop's iteration cost), etc.

The idea seemed worth pointing out as I have too often seen ugly designs used to get around the feared cost of repeated virtual dispatches (or even function calls) and this pattern is a far cleaner way to see a substantial performance boost without compromising the cleanliness of your design.


Edit: After thinking through it, I see why Apple's iterator would need external state: it allows the user code to exit the iterator loop early and the next iteration attempt to know that it must start over which would require additional explicit calls (which would not work in Apple's syntactic sugar) unless the state is recreated when the next loop is entered.
  • Post a new comment


    default userpic