Apr 20th 2008 08:38 pm

T-SQL Split - An xml based approach

When we select information from our database we have to be careful to filter only results that the user has access to, e.g. an user has access to user information in a fixed list of departments:

Select * from Employee where department_id in (1,3,6)

The problem is comes when we try to parameterize our query because T-SQL requires a parameter for each item in the “IN” clause.  When we use our OR/M (nHibernate) and the setParameterList method, it creates a query like this:

sp_executeSql(n'Select * from Employee where department_id in (@p0,@p1,@p2)',
'@p0 int, @p1 int, @p2 int',
@p0=1,@p1=3,@p2=6);

This has a few issues:

  1. As the list size increases, query performance degrades due to the number of parameters.
  2. We risk throwing a “Too many parameters were provided in this RPC request. The maximum is 2100″ error depending on the number of items in the list(s) the query uses.
  3. These queries cannot be cached effectively because of the variations in parameters.

We considered building a list of comma delimited ids, passing that to SQL as an nvarchar parameter, and using a user defined function like this:

Select * from SomeTable where id in (dbo.fnSplit(@ids))

After some investigation, however, we found no standard way to implement the fnSplit function. Everybody has their own version optimized for list size, ordering, and format.

Instead we serialized the list to xml (pretty easy in.net) and deserialized it using:

CREATE FUNCTION [dbo].[fn_SplitIntsFromXmlNoPrimaryKey](@input varchar(max))
RETURNS @retArray TABLE (id int)
AS
BEGIN
	DECLARE @xml xml
	SET @xml = @input
	insert into @retArray
	SELECT   T.id.value(’.', ‘int’) AS id
	FROM @xml.nodes(’/ArrayOfInt/int’) T(id)
	RETURN
END
select * from [dbo].[fn_SplitIntsFromXmlNoPrimaryKey] (
‘
  33
  72
  74
  76
  78
  80
80
 ‘)

For larger numbers of ids, performance degraded due to the table scan:
fnSplitWithNoPrimaryKey
So we optimized the function by introducing a primary key on the ints:

CREATE FUNCTION [dbo].[fn_SplitIntsFromXml](@input varchar(max))
RETURNS @retArray TABLE (id int Primary Key)
AS
BEGIN
	DECLARE @xml xml
	SET @xml = @input
	insert into @retArray
	SELECT   T.id.value(’.', ‘int’) AS id
	FROM @xml.nodes(’/ArrayOfInt/int’) T(id)
	RETURN
END
select * from [dbo].[fn_SplitIntsFromXml] (
‘
  33
  72
  74
  76
  78
  80
 ‘)

fnSplitWithPrimaryKey
For several thousand ids, the performance difference was quite dramatic at 1 second versus 13 seconds. Although our testing was light it was conclusive- a primary key allows SQL to avoid using a heap for storing/searching the table valued function result and this is much faster. Since we’re using a single parameter per IN clause, we are no longer limited by the RPC parameter limit, the query plan can be cached, and performance is within limits.
Happy coding.

No Comments yet »

Trackback URI | Comments RSS

Leave a Reply

« Concatenating values in t-sql select statements | What I wish I knew about Service Broker before I started using it »