Patterns of Enteprise Application Architecture - Data Persistent

Mar-2005

1. Introduction

This document highlights some experiments in the quest to find answers to some questions, including those from fellow colleagues, pertaining the behavior and functionality of a C++ framework (hereafter referred to as TPF).

2. Experiment 1: Persistent of heterogeneous collection into database

It is well known that TPF is capable of persisting a homogeneous collection, which means all objects of same type in the container are stored in a same table. In one of the discussions, we talked about possibility of inserting objects of different types into a collection and store that collection. Although this was not a required feature in all of the TPF-based applications, a test had been carried out to verify the behavior.
Let’s look at how relationship is mapped and implemented by the framework.

Figure 1: Relational Mapping

The above mapping schema supports one-to-one, one-to-many and many-to-many relationships. FromId column in Bridge table maps to Id in Parent table, while toId column maps to Id column of ChildrenType1 table. When the parent object contains N number of ChildrenType1 object, N numbers of rows are inserted into Bridge table, and N numbers of rows are also created in ChildrenTyp1 table.

Here is another way to explain the relationship between Parent and ChildrenType1, in form of query language:

select * from parent p, childrentype1 c, bridge b
where p.id = b.fromid and
b.toid = c.id and p.id = ‘anId’

A similar kind of mapping was also described in Martin Fowler’s “Association Table Mapping”[1]. However, it is interesting to note that the framework was written long before his book was published.
In this framework, it is also possible for a parent object to have multiple compositions. An object could contain multiple collections as its members. In such case, the mapping implementation looks like the following.


Figure 2: One parent with many kinds of children connected through single bridge table

Our question was what if a parent object contains a collection that contains objects from different types, a heterogeneous container?
The result of this experiment showed that when a collection in a parent contains different kind of objects, each object will still be stored in its own table correctly as defined in the object’s class metadata. The pseudo code looks like this

ParentObj* parent = new ParentObj ();
parent->PField1(“somevalue”);
parent->PField2(“anothervalue”);

Ordered* anOrdered = new Ordered(); // will be heterogeneous
anOrdered->insert(new ChildrenType1(“CField1Value”,”CField2Value”));
anOrdered->insert(new
ChildrenType2(“C2Field1Value”,”C2Field2Value”,”C2Field1Value”)));
anOrdered->insert(new ChildrenType3());

parent->setChildren(anOrdered);
parent->store(); // actual insert into database takes place here

Upon success of store, one row is inserted into Parent table, three rows in Bridge table and one each in ChildrenType1, ChildrenType2 and ChildrenType3.
It is worthwhile to note that when fetching a parent object, the framework knew where to fetch the objects for the heterogeneous children collection. The way the fetch was implemented in this framework is a follow:

  • Fetch parent row
  • Fetch toId from the bridge table where fromId = parent.id
  • For each toId in bridge table, if not in lazy fetch mode, resolve from the first segment of toId the corresponding table name
  • Fetch row from target table where id = toId
  • Construct the parent object and children object from rows fetched

It was also noted that id column of a persistent object in this framework comprised of 4 segments. The first constituent is the class id of the object. Class Id is always defined in the metadata of the persistent class. The framework is able to relate the class Id back to the metadata where the oracle table is also defined.
This kind of mapping demonstrated a powerful and flexible approach. Like any other technologies, it has it weaknesses too. As table rows grows (to hundreds of millions of rows), the performance of this kind of mapping is likely to be compromised due to extra join to the bridge table (for one-to-one and one-to-many). For many-to-many, the bridge is inevitable. Nevertheless, all columns in the Bridge are indexed for forward searching (parent to children).

3. Experiment 2: Lazy Fetch

Our framework supports a mechanism called “lazy fetch” as in many of other commercial and open source persistent frameworks. It has also a reference manager that keeps track of an object instance in a process thread. Lazy fetch and reference manager works together to provide resource optimization.

Reference Manager
A reference manager keeps track of objects in memory. Before an object is fetched from its data store, the framework checks from the reference manager to see if a copy already existed in the memory. If an instance is found, a reference is returned to the program requesting the object without hitting the database. If the requested object instance is not found in the reference manager, the framework will issue a SQL fetch to fetch the id column of the object, update the reference manager and return that object reference to the caller program. This mechanism saves unnecessary database round-trip.

Lazy Fetch
When an object instance is requested, the framework default is to fetch only a reference from the data base. This reference consists of only the id field of the object in the table. The SQL statement used to fetch this reference is as follow:

select id from table_of_object [where somefield = somevalue];

As one can see, this SQL statement is in its simplest form, that consumes minimal network, processor and disk I/O resource. The reason of doing this is obvious, it is not necessary to fetch the entire object row until its members are referenced. It is particularly useful if someone wants to know just if an object exist in the database, but do not wish to know about its other field.
From coding point of view, the object is fetched when an CRef (a part of the framework) is used in conjunction with a persistent class. For example:

CRef(ParentObject) po = ParentObject::fetchRefWhere(whereClause) ; // only id field of ParentObject matching given whereClause is returned
CString aString = po->getPField1(); // -> is overloaded. This time another SQL statement is issued to fetch the object body...
CString aString = po->getPField2(); // no further SQL fetch is issued

At first, po appeared as a local object, and yes, it is. But it’s usage of -> operator makes it looks like an object on the heap. The -> operator is overloaded to ‘intercept’ the access to the members of ParentObject.
In this operator resides the code that performs the reference manager lookup. If entire row is not yet fetched, the framework will issue a SQL that fetches entire row from the data store.

select * from ParentObject where [whereClause];

Upon return of query, the framework updates the reference manager, constructs the object instance through object Factory and populates all fields as defined in the metadata.
All subsequent references to other fields of ParentObject will not cause the row to be fetched again because a reference already exists in the reference manager.

4. Experiment 3: Persistent of Inherited Object

This experiment was performed to find out how the framework persist objects derived from another persistent object. There are various known methods of mapping such inheritance into physical table in database. Martin Fowler described in his book (Patterns of Enterprise Application Architecture, PoEAA) some major approaches, including advantages and disadvantages of each method.
Provided here is a summary of possible implementations, as described in PoEAA.
· Single Table Inheritance (Chapter 12 –Object Relational Structural Patterns)
In this mapping, all fields in every class within the inheritance hierarchy reside in a same physical table.
· Class Table Inheritance (Chapter 12 –Object Relational Structural Patterns)
One database table per class in the inheritance structure, including the abstract base class
· Concrete Table Inheritance (Chapter 12 –Object Relational Structural Patterns)
One database table per concrete class in the inheritance structure. The fields in the abstract base class are placed in each individual table of the concrete class.
Our experiment showed that the framework implements “Concrete Table Inheritance” mapping. The variance we observed in this framework is that it allowed a derived object to modify the metadata of the base object, such as field name, database column name and type.


Figure 3: Concrete Table Inheritance

The concrete tables are independent. Which means a change in the base class attribute values will not affect the ones in the derived class. The derived object does not contain an instance of base object within itself. Though we haven’t seen any use of this form of object inheritance in the products.

5. Summary

Conceptual Modeling
The experiments described in this document require one to understand topics related to data modeling. We were able to describe a relationship between tables using SQL statement as well as data modeling diagrams. The concept of classes, instances, generalization and data modeling are well understood. We were also aware that in the case of Experiment 3 above, “Concrete Table Inheritance” does cause data redundancy. Choosing “Concrete Table Inheritance” was a design decision the framework team had made. A good software architect will make design decision after careful consideration of factors such as performance trade-off, complexity of design and software reporting needs.

[1] Martin Fowler – Patterns of Enterprise Application Architecture. An excerpt of this pattern is found at http://www.martinfowler.com/eaaCatalog/associationTableMapping.html