May 27th 2008 05:29 pm

Class Table Inheritance: An Investigation into SQL Performance

Class Table Inheritance is a powerful tool when persisting a strong object oriented business model, but what happens when the data size grows? If we know the type of object we’re selecting, the query is straightforward and performs well:

select
	 base_class.id
	,base_class.sub_type
	,inheriting_class.column1
..
	,inheriting_class.columnn
from
	base_class inner join inheriting_class on base_class.id = inheriting_class.base_class_id

Using multiple tables, requires an inner join to select data and multiple updates to persist it. Outside high volume transaction applications, however, this cost this is negligible.
The problem begins when we need to select a potentially heterogeneous set of records. The longer the inheritance chain and the larger number of sub-classes the less performant our queries will be.
Suppose we have a relatively simple object model as follows:

click the links for scripts to create and populate.

In order to select a set of records “inheriting” from base_class, the business layer or OR/M will generate a query similar to the following (note I’m not selecting all the records, just a subset akin to the filters one would use in a real application):

select
	 base_class.id
	,A.base_class_id
	,B.base_class_id
	,C.base_class_id
	,case when A.base_class_id is not null then 1 when B.base_class_id is not null then 2 when C.base_class_id is not null then 3 end as clazz_
from
	base_class left join inheriting_class_a as A on base_class.id = A.base_class_id
	left join inheriting_class_b as B on base_class.id = B.base_class_id
	left join inheriting_class_c as C on base_class.id = C.base_class_id
where base_class.id % 7 = 1

On average the query takes 700ms on my machine (dual quad-core x64 xp). If we add in our knowledge of the sub-type into the join, the performance does not improve.

select
	 base_class.id
	,A.base_class_id
	,B.base_class_id
	,C.base_class_id
	,case when A.base_class_id is not null then 1 when B.base_class_id is not null then 2 when C.base_class_id is not null then 3 end as clazz_
from
	base_class as base_class left join inheriting_class_a as A on base_class.id = A.base_class_id and base_class.sub_type = 2
	left join inheriting_class_b as B on base_class.id = B.base_class_id and base_class.sub_type = 0
	left join inheriting_class_c as C on base_class.id = C.base_class_id and base_class.sub_type = 1
where base_class.id % 7 = 1

In fact the query execution plan is still the same despite a unique composite index on id and sub_type. The left joins are simply expensive. We need them to return heterogeneous results but the cost has become prohibitive. The longer the inheritance chain and the larger the number of sub-types, the more drastically the performance will degrade beyond this simple example.
One workaround is to perform the expensive outer joins on a smaller subset of date. If we execute the same query using a Common Table Expression and filter the rows before doing the outer joins, the cost decreases by 50% to an average of 371ms.

select
	 base_class.id
	,base_class.sub_type
	,C.base_class_id
	,B.base_class_id
	,C.base_class_id
	,case when A.base_class_id is not null then 1 when B.base_class_id is not null then 2 when C.base_class_id is not null then 3 end as clazz_
from
	base_class as base_class left join inheriting_class_a as C on base_class.id = C.base_class_id
	left join inheriting_class_b as B on base_class.id = B.base_class_id
	left join inheriting_class_c as C on base_class.id = C.base_class_id
where base_class.id % 7 = 1

By comparison, using Single Table inheritance (click links for create and populate scripts), the performace on the following:

	select id, 1 as clazz_
	from sub_class_a
	where id %7 = 1
union all
	select id, 2 as clazz_
	from sub_class_b
	where id %7 = 1
union all
	select id, 3 as clazz_
	from sub_class_c
	where id %7 = 1

is far better than that of the Class Table Inheritance at an average of 140ms. It’s important to note that performance will vary dramatically depending on the size of your data, the “selectiveness” of your where clause and other specifc circumstances.
From the perspective of query performance, Single Table Inheritance is far more efficient than Class Table Inheritance. Within the context of Class Table Inheritance, what is not so obvious is that the cost of returning heterogenous lists or simply not knowing the sub-type at query time can be quite high. Filtering the records (if possible) before applying the outer joins can drastically improve performance.

No Comments yet »

Trackback URI | Comments RSS

Leave a Reply

« Howto: avoid binding the same eventhandler multiple times to an element event in the MS AJAX framework | Time based Caching in a Load Balanced ASP.Net Application »