Today I had an issue in which I wanted to build out a tree of data. This information is collected in a single table with self joins to define the hierarchy. I wanted to avoid having to repeatedly query the database for a subset of the records - instead, I wanted to get the whole batch from the database, already organized and set up in the tree. I've done things such as this in Oracle via the CONNECT BY statement in the past, but unfortunately SQL Server (in its current incarnation) doesn't support this type of functionality; therefore, I had to resort to rolling my own.
A join (either OUTER or INNER) did not give me what I wanted, though the relationship could be established, so I decided to write a stored procedure to accomplish this. Now, to be honest, I have not profiled this to see how it can be optimized (I'd welcome insights regarding that) but for small set of data (e.g. < 10,000 rows) I think this approach might work reasonably well.
Basically, the idea is that the stored procedure establishes two temporary tables: one for the final output and one for the work in progress. You can call the procedure, indicating the root node for the hierarchy (defaulting to the top - those items that have no parent; identified by NULL). From that point, the procedure begins to analyze the data, writing into the temporary work table (called #Stack) and recording records as necessary into the #Output table.
My strategy (as this will be used in a web-based application and will rarely, if ever, change) is to query it once and cache it so I don't have to perform this analysis over and over again. Additionally, I'm really only expecting < 50 records so the hit is pretty minimal. What do you think of the approach? Good? Bad? Ugly? Ingenious? Mediocre? I'd be interested in getting some good feedback.
CREATE PROCEDURE dproc_GetCategoryList
@RootParentID int = NULL
AS
SET NOCOUNT ON
-- establish the table that will contain the output results
CREATE TABLE #Output ( ID int, Name nvarchar(35), ParentID int )
-- establish the work table
-- this data will fluctuate and will represent the information that is being processed
CREATE TABLE #Stack ( ID int, Depth int )
DECLARE @currID int
DECLARE @depth int
IF ( @RootParentID IS NULL )
-- start at the parent level (there might be multiple top-level parents) and start working our way down
-- top-level parents are identified by not having parent themselves
DECLARE topLevel CURSOR FOR
SELECT ID FROM tblCategoryIndex WHERE ParentID IS NULL ORDER BY Name
ELSE
-- start at the level prescribed by the caller and build out the child hierarchy
DECLARE topLevel CURSOR FOR
SELECT ID FROM tblCategoryIndex WHERE ID = @RootParentID
OPEN topLevel
FETCH NEXT FROM topLevel INTO @currID
WHILE ( @@FETCH_STATUS = 0 )
BEGIN
SET @depth = 1
INSERT INTO #Stack VALUES ( @currID, @depth )
WHILE ( @depth > 0 )
BEGIN
IF EXISTS ( SELECT ID FROM #Stack WHERE Depth = @depth )
BEGIN
-- read the next record's id for processing so we can get its children and delete it
SELECT @currID = ID FROM #Stack WHERE Depth = @depth
DELETE FROM #Stack WHERE Depth = @depth AND ID = @currID
-- add the category to the output table
INSERT INTO #Output SELECT ID, Name, ParentID FROM tblCategoryIndex WHERE ID = @currID
-- get the children of the current category
INSERT INTO #Stack SELECT ID, @Depth + 1 FROM tblCategoryIndex WHERE ParentID = @currID ORDER BY Name DESC
IF ( @@ROWCOUNT > 0 )
SET @depth = @depth + 1
END
ELSE
-- move up one level, we're recursed as deep on this node branch as we can go
SET @depth = @depth - 1
END
FETCH NEXT FROM topLevel INTO @currID
END
CLOSE topLevel
DEALLOCATE topLevel
SET NOCOUNT OFF
-- return the result set
SELECT * FROM #Output
GO
- [Updated 09/28/2004 for proper sorting of the name - the list is now properly alphabetized in the result set.]