Monday, September 27, 2004
« Clarity is Overrated | Main | Router Table screwed up »

Today I had an issue in which I wanted to build out a tree of data.  This information is collected in a single table with self joins to define the hierarchy.  I wanted to avoid having to repeatedly query the database for a subset of the records - instead, I wanted to get the whole batch from the database, already organized and set up in the tree.  I've done things such as this in Oracle via the CONNECT BY statement in the past, but unfortunately SQL Server (in its current incarnation) doesn't support this type of functionality; therefore, I had to resort to rolling my own.

A join (either OUTER or INNER) did not give me what I wanted, though the relationship could be established, so I decided to write a stored procedure to accomplish this.  Now, to be honest, I have not profiled this to see how it can be optimized (I'd welcome insights regarding that) but for small set of data (e.g. < 10,000 rows) I think this approach might work reasonably well.

Basically, the idea is that the stored procedure establishes two temporary tables: one for the final output and one for the work in progress.  You can call the procedure, indicating the root node for the hierarchy (defaulting to the top - those items that have no parent; identified by NULL).  From that point, the procedure begins to analyze the data, writing into the temporary work table (called #Stack) and recording records as necessary into the #Output table.

My strategy (as this will be used in a web-based application and will rarely, if ever, change) is to query it once and cache it so I don't have to perform this analysis over and over again.  Additionally, I'm really only expecting < 50 records so the hit is pretty minimal.  What do you think of the approach?  Good?  Bad?  Ugly?  Ingenious?  Mediocre?  I'd be interested in getting some good feedback.

CREATE PROCEDURE dproc_GetCategoryList
   @RootParentID int = NULL
AS
   SET NOCOUNT ON

   -- establish the table that will contain the output results
   CREATE TABLE #Output ( ID int, Name nvarchar(35), ParentID int )

   -- establish the work table
   -- this data will fluctuate and will represent the information that is being processed
   CREATE TABLE #Stack ( ID int, Depth int )

   DECLARE @currID int
   DECLARE @depth int

   IF ( @RootParentID IS NULL )
      -- start at the parent level (there might be multiple top-level parents) and start working our way down
      -- top-level parents are identified by not having parent themselves
      DECLARE topLevel CURSOR FOR
      SELECT ID FROM tblCategoryIndex WHERE ParentID IS NULL ORDER BY Name
   ELSE
      -- start at the level prescribed by the caller and build out the child hierarchy
      DECLARE topLevel CURSOR FOR
      SELECT ID FROM tblCategoryIndex WHERE ID = @RootParentID

   OPEN topLevel
   FETCH NEXT FROM topLevel INTO @currID
      WHILE ( @@FETCH_STATUS = 0 )
         BEGIN
         SET @depth = 1
         INSERT INTO #Stack VALUES ( @currID, @depth )

         WHILE ( @depth > 0 )
            BEGIN
               IF EXISTS ( SELECT ID FROM #Stack WHERE Depth = @depth )
                  BEGIN
                     -- read the next record's id for processing so we can get its children and delete it
                     SELECT @currID = ID FROM #Stack WHERE Depth = @depth
                     DELETE FROM #Stack WHERE Depth = @depth AND ID = @currID

                     -- add the category to the output table
                     INSERT INTO #Output SELECT ID, Name, ParentID FROM tblCategoryIndex WHERE ID = @currID

                     -- get the children of the current category
                     INSERT INTO #Stack SELECT ID, @Depth + 1 FROM tblCategoryIndex WHERE ParentID = @currID ORDER BY Name DESC

                     IF ( @@ROWCOUNT > 0 )
                        SET @depth = @depth + 1
                  END
               ELSE
                  -- move up one level, we're recursed as deep on this node branch as we can go
                  SET @depth = @depth - 1
            END

         FETCH NEXT FROM topLevel INTO @currID
      END
   CLOSE topLevel
   DEALLOCATE topLevel

   SET NOCOUNT OFF

   -- return the result set
   SELECT * FROM #Output
GO

- [Updated 09/28/2004 for proper sorting of the name - the list is now properly alphabetized in the result set.]

Monday, September 27, 2004 8:40:00 PM (Mountain Standard Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  |  Trackback