Apr 20th 2008 08:38 pm
T-SQL Split - An xml based approach
When we select information from our database we have to be careful to filter only results that the user has access to, e.g. an user has access to user information in a fixed list of departments:
Select * from Employee where department_id in (1,3,6)
The problem is comes when we try to parameterize our query because T-SQL requires a parameter for each item in the “IN” clause. When we use our OR/M (nHibernate) and the setParameterList method, it creates a query like this:
sp_executeSql(n'Select * from Employee where department_id in (@p0,@p1,@p2)', '@p0 int, @p1 int, @p2 int', @p0=1,@p1=3,@p2=6);
This has a few issues:
- As the list size increases, query performance degrades due to the number of parameters.
- We risk throwing a “Too many parameters were provided in this RPC request. The maximum is 2100″ error depending on the number of items in the list(s) the query uses.
- These queries cannot be cached effectively because of the variations in parameters.
We considered building a list of comma delimited ids, passing that to SQL as an nvarchar parameter, and using a user defined function like this:
Select * from SomeTable where id in (dbo.fnSplit(@ids))
After some investigation, however, we found no standard way to implement the fnSplit function. Everybody has their own version optimized for list size, ordering, and format.
Instead we serialized the list to xml (pretty easy in.net) and deserialized it using:
CREATE FUNCTION [dbo].[fn_SplitIntsFromXmlNoPrimaryKey](@input varchar(max)) RETURNS @retArray TABLE (id int) AS BEGIN DECLARE @xml xml SET @xml = @input insert into @retArray SELECT T.id.value(’.', ‘int’) AS id FROM @xml.nodes(’/ArrayOfInt/int’) T(id) RETURN END
select * from [dbo].[fn_SplitIntsFromXmlNoPrimaryKey] ( ‘33 72 74 76 78 80 80 ‘)
For larger numbers of ids, performance degraded due to the table scan:
So we optimized the function by introducing a primary key on the ints:
CREATE FUNCTION [dbo].[fn_SplitIntsFromXml](@input varchar(max)) RETURNS @retArray TABLE (id int Primary Key) AS BEGIN DECLARE @xml xml SET @xml = @input insert into @retArray SELECT T.id.value(’.', ‘int’) AS id FROM @xml.nodes(’/ArrayOfInt/int’) T(id) RETURN END
select * from [dbo].[fn_SplitIntsFromXml] ( ‘33 72 74 76 78 80 ‘)
For several thousand ids, the performance difference was quite dramatic at 1 second versus 13 seconds. Although our testing was light it was conclusive- a primary key allows SQL to avoid using a heap for storing/searching the table valued function result and this is much faster. Since we’re using a single parameter per IN clause, we are no longer limited by the RPC parameter limit, the query plan can be cached, and performance is within limits.
Happy coding.
No Comments yet »