Costing structures in relational databases

The issue

A common issue we faced when doing costing with any solution we had in-house was the typical hierarchical structure of you assembly tree. Say you have a seat frame, which itself consists of multiple parts which again consist of multiple parts and so on.

How you cost each part - say we just get the weight and multiply by some random price - is more or less independent on where it is located in your tree (ignoring overheads).

So you naturally have something like

class Part:
    id: int
    parent_id: int
    quantity: int # how often do I have the part?
    weight_kg: float

CREATE TABLE parts (
    id INT PRIMARY KEY,
    parent_id INT,
    quantity INT,
    weight_kg FLOAT
)

This is the classical N+1 query problem if you need to get a part and all of its children.

And this is needed basically for every cost calculation! If you want to calculate the cost of a given part, you need to query and calculate the cost of all children and get the quantities of all parents until the root. Thats a lot of queries when you have more then 10,000 items in your calculation.

Solution 1 - Adjecency Lists

We can just ‘duplicate’ the entries in the database so we always have all children for a part.

CREATE TABLE parts (
    id INT,
    parent_id INT,
    direct_parent_id INT,
    quantity INT,
    weight_kg FLOAT
)

For a structure like Part 1 -> Part 2 -> Part 3 we have then

id	parent_id	direct_parent_id
1	NULL	NULL
2	1	1
3	1	2
3	2	2

so we can just query for parent_id eq 1 and get the whole tree back. This has the significat drawback though that if you have a hierarchy which changes often (which is very much the case in active costings) WRITES become insanely costly, especially if you add items in the middle of the tree.

So no no for our use-case.

Solution 2 - Dedicated data structure

We are in luck that some nice database deigners have seen the problems with this. We have two ‘native’ solutions which I know of:

Both solution encode the tree path in something like /path/to/my/item in a singular field which you can query. In the case of the HierarchyId its a compact 38bit field which you need to correctly use to represent your hierarchy. We encoded the path and itemtype into the hierarchy id leading to tremendous speed ups!

I guess ltree works similar!

I’ll try to post some rough benchmarks on both solutions as I only have experience with the HierarchyId :)

Solution 3 - Document databases

Yeah yeah I know document databases could be nice here, but we also do benchmarking… maybe I add this to some benchmark and see what would have been :)