Joe Celko 12 Feb 2002 Rating: -4.77- (out of 5) Hall of fame tip of the month winner
As you will recall from my previous article, the usual example of a tree structure in SQL books is called an adjacency list model and it looks like this:
CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, boss CHAR(10) DEFAULT NULL REFERENCES Personnel(emp)); Personnel emp boss =================== 'Albert' 'NULL' 'Bert' 'Albert' 'Chuck' 'Albert' 'Donna' 'Chuck' 'Eddie' 'Chuck' 'Fred' 'Chuck'
Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple Personnel table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of nodes are adjacent to each other.
CREATE TABLE Personnel (emp CHAR(10) NOT NULL PRIMARY KEY, lft INTEGER NOT NULL UNIQUE CHECK (lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), CONSTRAINT order_okay CHECK (lft < rgt) ); Personnel emp lft rgt ====================== 'Albert' 1 12 'Bert' 2 3 'Chuck' 4 11 'Donna' 5 6 'Eddie' 7 8 'Fred' 9 10
The organizational chart would look like this as a directed graph:
Albert (1,12)
/ \
/ \
Bert (2,3) Chuck (4,11)
/ | \
/ | \
/ | \
/ | \
Donna (5,6) Eddie 7,8) Fred (9,10)
To show a tree as nested sets, replace the nodes with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other node. The leaf nodes will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a node, he puts a number in the cell on the side that he is visiting and increments his counter. Each node will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm.
The code for implementing this in T-SQL is a straight forward stack implementation. First, let's load up some data into a tree table and then create a stack table. I will explain how the stack works in a minute.
-- Tree holds the adjacency model
CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
boss CHAR(10));
-- insert the sample data for testing
INSERT INTO Tree VALUES ('Albert', NULL);
INSERT INTO Tree VALUES ('Bert', 'Albert');
INSERT INTO Tree VALUES ('Chuck', 'Albert');
INSERT INTO Tree VALUES ('Donna', 'Chuck');
INSERT INTO Tree VALUES ('Eddie', 'Chuck');
INSERT INTO Tree VALUES ('Fred', 'Chuck');
-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
emp CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);
Each row of the stack holds the nested set (lft, rgt) pair, the node value (emp) and an integer that represents the current top of the stack as an integer. When the stack_top is positive, something has been pushed onto the stack. When the stack_top is negative, it has been popped off the stack.
The algorithm is pretty straight forward, though there are some tricks about representing a stack in T-SQL. Here is what we know:
Here is the code in T-SQL:
DROP TABLE Stack; CREATE TABLE Stack (stack_top INTEGER NOT NULL, child VARCHAR(10) NOT NULL, lft INTEGER NOT NULL, rgt INTEGER);
-- you can create optional indexes on stack_top and child columns
BEGIN
DECLARE @lft_rgt INTEGER, @stack_pointer INTEGER, @max_lft_rgt INTEGER;
SET @max_lft_rgt = 2 * (SELECT COUNT(*) FROM Tree);
INSERT INTO Stack
SELECT 1, child, 1, @max_lft_rgt
FROM Tree
WHERE parent IS NULL;
SET @lft_rgt = 2;
SET @Stack_pointer = 1;
DELETE FROM Tree
WHERE parent IS NULL;
-- The Stack is now loaded and ready to use
WHILE (@lft_rgt < @max_lft_rgt)
BEGIN
IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.child = T1.parent
AND S1.stack_top = @stack_pointer)
BEGIN -- push when stack_top has subordinates and set lft value
INSERT INTO Stack
SELECT (@stack_pointer + 1), MIN(T1.child), @lft_rgt, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.child = T1.parent
AND S1.stack_top = @stack_pointer;
-- remove this row from Tree
DELETE FROM Tree
WHERE child = (SELECT child
FROM Stack
WHERE stack_top = @stack_pointer + 1);
SET @stack_pointer = @stack_pointer + 1;
END -- push
ELSE
BEGIN -- pop the Stack and set rgt value
UPDATE Stack
SET rgt = @lft_rgt,
stack_top = -stack_top
WHERE stack_top = @stack_pointer
SET @stack_pointer = @stack_pointer - 1;
END; -- pop
SET @lft_rgt = @lft_rgt + 1;
END; -- if
END; -- while
SELECT * FROM Stack ORDER BY lft;
Stack
stack_top emp lft rgt
-----------------------------
-1 Albert 1 12
-2 Bert 2 3
-2 Chuck 4 11
-3 Donna 5 6
-3 Eddie 7 8
-3 Fred 9 10
Note that the leftover stack_top numbers are the negatives of the depth of their node in the original tree. Also, notice that the original tree is being destroyed in this procedure; you might want to save and use a copy in a temporary table instead.
About the Author
Joe Celko is author of SQL for Smarties: Advanced SQL Programming (Morgan-Kaufmann, 1999).
For More Information
DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.