Converting an adjacency list model to a nested set model

 Joe Celko
12 Feb 2002
Rating: -4.77- (out of 5) Hall of fame tip of the month winner

As you will recall from my previous article, the usual example of a tree structure in SQL books is called an adjacency list model and it looks like this:


 CREATE TABLE Personnel
 (emp CHAR(10) NOT NULL PRIMARY KEY,
  boss CHAR(10) DEFAULT NULL REFERENCES Personnel(emp));

 Personnel
 emp       boss
 ===================
 'Albert'  'NULL'
 'Bert'    'Albert'
 'Chuck'   'Albert'
 'Donna'   'Chuck'
 'Eddie'   'Chuck'
 'Fred'    'Chuck'

Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple Personnel table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of nodes are adjacent to each other.


 CREATE TABLE Personnel
 (emp CHAR(10) NOT NULL PRIMARY KEY,
  lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
  rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
  CONSTRAINT order_okay CHECK (lft < rgt) );

 Personnel
 emp         lft  rgt
 ======================
 'Albert'      1   12
 'Bert'        2    3
 'Chuck'       4   11
 'Donna'       5    6
 'Eddie'       7    8
 'Fred'        9   10

The organizational chart would look like this as a directed graph:


            Albert (1,12)
            /        \
           /          \
    Bert (2,3)     Chuck (4,11)
                   /    |     \
                  /     |      \
                 /      |       \
                /       |        \
        Donna (5,6)  Eddie 7,8)  Fred (9,10)

To show a tree as nested sets, replace the nodes with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other node. The leaf nodes will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting.

To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a node, he puts a number in the cell on the side that he is visiting and increments his counter. Each node will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm.

The code for implementing this in T-SQL is a straight forward stack implementation. First, let's load up some data into a tree table and then create a stack table. I will explain how the stack works in a minute.


 -- Tree holds the adjacency model
CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
 boss CHAR(10));

 -- insert the sample data for testing
INSERT INTO Tree VALUES ('Albert', NULL);
INSERT INTO Tree VALUES ('Bert', 'Albert');
INSERT INTO Tree VALUES ('Chuck', 'Albert');
INSERT INTO Tree VALUES ('Donna', 'Chuck');
INSERT INTO Tree VALUES ('Eddie', 'Chuck');
INSERT INTO Tree VALUES ('Fred', 'Chuck');

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
 emp CHAR(10) NOT NULL,
 lft INTEGER,
 rgt INTEGER);

Each row of the stack holds the nested set (lft, rgt) pair, the node value (emp) and an integer that represents the current top of the stack as an integer. When the stack_top is positive, something has been pushed onto the stack. When the stack_top is negative, it has been popped off the stack.

The algorithm is pretty straight forward, though there are some tricks about representing a stack in T-SQL. Here is what we know:

  1. We will do (2 * (SELECT COUNT(*) FROM Tree)) operation to build the (lft, rgt) pairs for each node. We need a general counter for this.
  2. When a node is pushed on the stack, we give it a lft number and increment the counter.
  3. When a node is popped from the stack, we give it a rgt number and increment the counter.
  4. We start at the root. Each nodes goes on and off the stack once and only once.
  5. We look at the top of the stack and push the youngest subordinate of that node onto the stack
  6. When the node on the top of stack is a leaf node or a node without "un-popped" subordinates back in the tree, pop it off the stack.

Here is the code in T-SQL:


DROP TABLE Stack;

CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
 child VARCHAR(10) NOT NULL,
 lft INTEGER NOT NULL,
 rgt INTEGER);

-- you can create optional indexes on stack_top and child columns


BEGIN
DECLARE @lft_rgt INTEGER, @stack_pointer INTEGER, @max_lft_rgt INTEGER;

SET @max_lft_rgt = 2 * (SELECT COUNT(*) FROM Tree);

INSERT INTO Stack
SELECT 1, child, 1, @max_lft_rgt 
  FROM Tree
 WHERE parent IS NULL;

SET @lft_rgt = 2;
SET @Stack_pointer = 1;

DELETE FROM Tree 
 WHERE parent IS NULL;

-- The Stack is now loaded and ready to use

WHILE (@lft_rgt < @max_lft_rgt)
BEGIN 
 IF EXISTS (SELECT *
              FROM Stack AS S1, Tree AS T1
             WHERE S1.child = T1.parent
               AND S1.stack_top = @stack_pointer)
    BEGIN -- push when stack_top has subordinates and set lft value
      INSERT INTO Stack
      SELECT (@stack_pointer + 1), MIN(T1.child), @lft_rgt, NULL
        FROM Stack AS S1, Tree AS T1
       WHERE S1.child = T1.parent
         AND S1.stack_top = @stack_pointer;

      -- remove this row from Tree 
      DELETE FROM Tree
       WHERE child = (SELECT child
                        FROM Stack
                       WHERE stack_top = @stack_pointer + 1);
      SET @stack_pointer = @stack_pointer + 1;
    END -- push
    ELSE
    BEGIN  -- pop the Stack and set rgt value
      UPDATE Stack
         SET rgt = @lft_rgt,
             stack_top = -stack_top
       WHERE stack_top = @stack_pointer 
      SET @stack_pointer = @stack_pointer - 1;
    END; -- pop
  SET @lft_rgt = @lft_rgt + 1;
  END; -- if
END; -- while

SELECT * FROM Stack ORDER BY lft;

 Stack
 stack_top   emp      lft  rgt
 -----------------------------
  -1         Albert   1     12
  -2         Bert     2      3
  -2         Chuck    4     11
  -3         Donna    5      6
  -3         Eddie    7      8
  -3         Fred     9     10

Note that the leftover stack_top numbers are the negatives of the depth of their node in the original tree. Also, notice that the original tree is being destroyed in this procedure; you might want to save and use a copy in a temporary table instead.

About the Author

Joe Celko is author of SQL for Smarties: Advanced SQL Programming (Morgan-Kaufmann, 1999).

For More Information

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.