Friday, December 27, 2013

Thoughts on Improving ADS 02 --Singe-chain History

Single-chain history

So far my thoughts on restructuring the history data has followed the ADS method where there is an individual history list for each cell. An alternative is to have a single history block per row for each update. Going back to the original example of table t, lets say that we start out with the values i=2, j=20, k=200 and execute the following updates in order:
update t set i=3, j=30
update t set j=40, k=400

After the update blocks are gone and just history remains, we are left with this:

Having a single history chain complicates things. The historic read algorithm is more complicated because the value that the reader wants may not be in the last history block older than the reader. In the above example, a reader older than either update would read k from the first history block but read i and j from the second. With the ADS strategy or history array, the reader can very quickly see if a given column has history because the history for each column is in a separate list. With the single history chain, if any column has history then for every column the reader accesses it has to potentially traverse a long history list even just to find out if that column has history.

On the other hand, having a single history chain could make multicolumn access faster when the columns do have history because the reader only has to traverse one list rather than one list for each column. Another advantage of this strategy is that it takes less space and it allows row updates and cell updates to coexist with very little added complexity. In other words, some updates could act as if the granularity of updates is whole rows (like many other RDBMSs do), preventing any other transaction from updating other cells in the row even if the first update didn’t update those particular cells. This would help to avoid rollbacks for certain workloads.

Reducing Memory Further

All of the alternative schemes I’ve discussed for representing history require a pointer in every row in the database in order to hold the pointer to the row history. Instead of putting a history pointer in each row, we could put one on each page, a single history pointer for the page that points to a list of row history objects. We could use a linked list of row history objects or an array. The linked list would generally take up less space if only a few rows are updated, but will always be slower.

No comments:

Post a Comment