Jeff (moonlessnights) wrote in ood,

Making heavy abstractions easier to swallow with opaque factories

I was recently working on a project which involved mining several data structures out of a binary data file. These data structures were complicated entities which stored several attributes each and were arranged as a stream of data structures stored at some offset into the file (note that finding this offset involved several seek operations as the index was not written with easy readability in mind).

The problem we faced was that we wanted the code responsible for reading this file to be reusable and have no knowledge of the higher-level objects in the specific application (in short, we wanted it designed well). We were actually re-writing this program to replace one which had no defined boundaries between the components. To that end, the low-level reader component created instances of the high-level objects as it iterated over the stream of records. In our implementation, this would not do. We had some ideas for how to do this better:

1) Give the low-level component a very large API so that we could query each attribute by some kind of abstract record ID.
PROS: the division between the components is very easy to see and makes no assumptions about behaviour on either side of the interface
CONS: this would lead to very large code size to implement and work with this large interface, internal performance would be terrible since the speed of seek-less streaming reads into the file would never be realized (it would be possible to get some of this through some clever caching, but that starts to make assumptions about how the interface is being used)

2) Allow the low-level component to create its own representations of the objects and then return them to the higher-level component which would then deconstruct and reconstruct the objects as it sees fit (the simplest but loosest way of doing this is with some sort of dictionary for each record).
PROS: the interface stays nice and small, performance is still good since the low-level is permitted to read the file in whichever way is most effective
CONS: the data becomes so hidden that it may be difficult to notice what is wrong if a bug is found, all the unwrapping and reconstruction code is error-prone and tedious to write and potentially large, the entire stream must be read into memory before returning the collection which may be unwarranted in some applications

3) Create an interface (this was Java so I will use Java terms) in the low-level which can be implemented by a factory in the high-level and passed in. The low-level can then ask the factory to create the instances for it, as it finds records in the data stream. It will collect and return these objects created by the factory without any knowledge of what they actually are (they could also adhere to some other interface which the low-level would use to work with them but treating them as totally opaque java.lang.Object instances was perfect for our needs).
PROS: allows for a simple interface (easily one method "process(Factory)"), no assumptions are made about how either side is working, future complexities can be offloaded to the factory (configuration options, etc, need not be propagated down to the low-level if the factory knows about them and has the right exposure to apply them), the streaming nature of the storage could be leveraged by the high-level component since the factory methods were called as the data was processed (consider, for example, generating an HTML report of the contents simply by streaming UTF8 data from the factory whenever the methods it is interested in are called - the memory footprint stays only big enough to contain one record even as n are processed).
CONS: may be over-kill for simple records

Needless to say, we went with option 3 and were surprised by the later enhancements we could make and the ease in which other programs could re-use the component.

Enhancements: It became useful to allow the data file to be fractured into smaller pieces and we were trivially able to extend the factory interface to allow the higher-level to take control over the the search path and open file handling for these other files.

Other components: We were able to easily write unit tests and small helper tools built on the low-level by selectively implementing the parts of the factory interface needed for a specific implementation.

For the other OS X guys out there, this is also similar to how the entire NSCoder hierarchy works. That example is also interesting in that it uses the streaming advantage to write partially encoded data, as it runs.
  • Post a new comment


    default userpic