What to expect from your new “Hybrid Cube”

From time to time, I hear different theories about what “Hybrid BSO” is. Let me start with a very simple statement – “Hybrid BSO” is not a new storage type. In fact, there is no “Hybrid BSO”. We have a not-so-new Hybrid Calculation Engine that works on top of BSO, ASO storage types and transparent partitions.

The closest explanation of what Hybrid is, I found in this blog post “Introduction to Essbase Hybrid Aggregation Mode” by Jake Turrell. From the end-user perspective, Jake is completely correct. The cube is separated into two parts: stored and dynamically calculated. In the BSO case, the stored part will be BSO blocks, and in the ASO case, the stored hierarchies will represent the stored part. Users can run their calculation scripts on the stored part, and the dynamic part is calculated on demand. However, from the internal point of view, the dynamic calculations are powered by a completely new and dynamic calculation engine. In fact, this new calculation engine became a backbone for multiple modules. We use it for all types of query processing (SmartView grids, MDX and report scripts), Calculation (calc) scripts, and MDX Insert and Export (I will cover MDX Insert and Export in a separate blog post).

In his blog, Jake pointed to a number of limitations of Hybrid engine. Most of these limitations are removed in Oracle Analytics Cloud (OAC) version of Essbase. He also described how to enable Hybrid. In OAC Essbase, you don’t need to enable Hybrid; it is enabled by default. Don’t worry, it will produce exactly the same results as your classic BSO cubes.

How does it work?

To understand what Hybrid engine is, we should define the input and output of the Hybrid Calculation Engine. As an input, the engine gets a cube slice definition, which is nothing but a list of members for each dimension. As an output, the engine fills the slice with the numbers.

To perform this job, the engine implements the following flow:

First, it retrieves the data from the source storage (ASO, BSO or partitions)
Second, it performs the calculations
And finally, it fills the slice with the result values

SmartView grid processing, MDX queries and even calc scripts, use the interface described above. It forms the slices, gets the numbers back, and later returns them to the user or pushes them back into the cube.

Performance

I often get performance-related questions/feedback from people that try Hybrid engine for the first time. In most cases, people assume that by simply enabling the Hybrid mode, their queries will run faster. Usually, this isn’t true. If your cube didn’t have a lot of sparse dynamic calculations (and most likely it didn't, because it’s an absolute No-No in the classic BSO), you won’t see any improvements.

It may surprise some people, but the main reason for us to implement Hybrid wasn’t query improvements. We actually tried to optimize cube calculation time. Most Essbase implementations are highly optimized for queries, and we rarely see issues with query response times, however this comes with the price of long calculations, long restructuring, a lot of disk space, and complex backup/restore strategies. Our main goal was to make BSO cube much smaller and the easiest way to achieve this was to pre-calculate less data. This is why we decided to improve our sparse dynamic calculations.

To get benefits from Hybrid, you need to convert part of your stored calculations into dynamic calculations. This is a tricky process and it deserves a post on its own, but there are some general principles that I would like to mention here.

Dynamic calculations are fast, but still slower than a simple retrieve of stored values. If your cube is huge, an attempt to convert every calculation into dynamic calculation will result in a long query time. This shouldn’t be a surprise for ASO developers. In ASO, we can create aggregated views to fix this issue, however, we don’t have aggregated views for BSO yet. If you have really big cubes, you should consider keeping some of your hierarchies stored. You can also keep some of your intermediate levels stored, which will simulate ASO aggregated views (not as efficient, though). Look into logs, to understand how many blocks were scanned to answer a particular query. More blocks will result in longer query time
Hierarchy-based dynamic aggregations are much faster than dynamic formulas. Developers should move the formulas to the latest steps in the calculation (see the next section). If it weren’t possible from the correctness point of view, it would be a good idea to consider pre-calculating the problematic formulas. Look into logs to understand how many formulas were executed. If you see that a query executes millions of formulas, it’s a sign for changing the solve order or moving the formulas into calc scripts.
Block sizes and sparse/dense settings are affecting calculation performance. You should take it into consideration when making a decision about your blocks.

Correctness

Now, when we understand the rationale behind converting a cube to dynamic calculations, and the performance tuning principles, we should disclose another pitfall – as you probably remember - using "Dynamic Calculation" changes the order in which Essbase calculates the values. By messing with dynamic settings and sparse/dense settings, users may change the calculation order of their cube, which can lead to some incorrect results.

Hybrid calculation engine was designed to follow exact calculation logic of classic BSO. This was done to make sure that if you move your classic BSO to the OAC, it would produce the same results. I would suggest reviewing "Understanding How Dynamic Calculation Changes Calculation Order" before you start the conversion of your outline. I would also suggest checking the results of your formulas before and after conversion. Plus, check the log files for your favorite queries, to see how many formula calculations it performs (remember the second bullet in the performance section). If you see wrong numbers after you converted your cube to be dynamic, and/or some formulas are executing too many times, you should consider changing the dimension order and using solve order to change the calculation logic.

Exercise

Now, let’s practice a bit. Download the attached DBX work book to create the cube. This cube will have 5 dimensions. Measures are dense, the rest are sparse. All the members in the cube are stored. DBX also includes a calc script, called “All”, which simply does CALC ALL. Import the DBX file into your OAC and run the calc script.

Look at the statistics for the SolveOrder.demo cube.

There are 3,264 blocks; only 780 of them are level0 blocks. Now, set QUERYTRACE in the application CFG file to -1 and restart the application. This will enable full query tracing. Using Smart View, open the following grid.

Notice the values of Sales, Sales1 and Revenue. In the data file, we have 780 data records. For each record, Amount is 2, Price is 10, and Discount is 5.

Sales and Sales1 have the same formula “Amount * Price” and therefore it’s 20. Revenue has formula “Sales – Discount”, which makes it 15. When we multiply these numbers by 780, we get the numbers on the grid.

In the db directory, you will find the query_summary.txt file. This file stores some top-level information about the query. Notice the following lines:

Number of blocks read: 1

Number of cells acquired after scan/agg (for calculations and/or output): 6

Number of formula cache windows processed : 0

Total number of dynamic cells processed: 0

Number of aggregated cells : 0

Number of formula calculation cells : 0

Number of calculations returned #Missing : 0

Number of skipped executions: 0

Number of non-missing output cells produced by this query : 6

--------------------- Units processing times: --------------------------

Stored data scan and kernel aggregation total processing time: 0.000

Calculation units total processing time: 0.000

Total query processing time: 0.000 sec

This would be the best query performance that you can achieve with a BSO cube. The values are coming from a single block and no dynamic calculations. However, to get this performance, we created 3,264 – 780 = 2,484 blocks.

Now, let’s change all the hierarchies to be dynamic.

First, let’s see the cube statistics.

As you can see, all the upper blocks disappeared. This is because now, we don’t have any stored non-level0 members in the cube. Let’s run the same Smart View query as we ran before. The results are the same. Now, let’s look into the query_summary.txt file.

Number of blocks read: 780

Number of cells acquired after scan/agg (for calculations and/or output): 6

Number of formula cache windows processed : 0

Total number of dynamic cells processed: 0

Number of aggregated cells : 0

Number of formula calculation cells : 0

--------------------- Units processing times: --------------------------

Stored data scan and kernel aggregation total processing time: 0.010

Calculation units total processing time: 0.000

Total query processing time: 0.010 sec

Remember, we have 780 records in the cube and we’re querying the upper block. Each record in the cube forms a block and all the data is on level0. The Hybrid Calc Engine should scan all 780 blocks to answer our query. Note that the query became a little bit slower. Now, it took an additional 0.010 seconds.

So far, we did only dynamic aggregation; now let’s make our formulas to be dynamic as well.

To do so open the outline in the edit mode, select a dimension and click "Set Dynamic to Branch"

After all the dimension except Measures are converted click save. Let's check the statistics again:

The block number is the same, but the block size went down by ~40%. This is because we changed dense members. For big data sets, 40% of storage reduction could mean a lot. Now we run the same query again.

Oops, the numbers don’t match anymore. Let’s look into the query_summary.txt file.

Number of blocks read: 780

Number of cells acquired after scan/agg (for calculations and/or output): 3

Number of formula cache windows processed : 1

Total number of dynamic cells processed: 3

Number of aggregated cells : 0

Number of formula calculation cells : 3

--------------------- Units processing times: --------------------------

Stored data scan and kernel aggregation total processing time: 0.000

Calculation units total processing time: 0.010

Total query processing time: 0.010 sec

We see the same number of scanned blocks, but now our scan produces only 3 points instead of 6. An additional 3 points are calculated by the dynamic formula. It looks like the engine first aggregated all the dimensions and only then calculated the formulas. Remember the pitfall we discussed earlier? Read the "Understanding How Dynamic Calculation Changes Calculation Order" article again. When Revenue and Sales measures were stored, we calculated before the aggregation of the rest of the dimensions, now we calculate after. Which means that now for Sales, we multiply total Amount by Total Price, which doesn’t make any sense from the business perspective. How can we fix this?

We need to change the solve order of Sales to be calculated before all the aggregations. If you check the solve order in the UI, you will see that the solve order of sparse dynamic members is 10 and the solve order of dense dynamic members is 30. Hybrid Calculation Engine follows the solve order and calculates members with lower solve orders before members with higher solve orders. This means that if we set Sales solve order to 5, it will be calculated before we aggregate any of the sparse dimensions. Let’s try that.

Now, let’s run the same query again.

Now, we see correct results for Sales and Revenue, but not for Sales1. Sales1 is OK, since we didn’t change its solve order, but why is Revenue OK? The reason is simple. The formula of Revenue is “Sales – Discount”, and once sales are calculated correctly, it doesn’t matter if we calculate it before aggregation or after. Or does it?

Well, from the mathematical point of view, it doesn’t; however from the performance point of view, it does. Let’s take a look at our favorite query_summary.txt file.

Number of blocks read: 780

Number of cells acquired after scan/agg (for calculations and/or output): 5541

Number of formula cache windows processed : 801

Total number of dynamic cells processed: 782

Number of aggregated cells : 3120

Number of formula calculation cells : 782

--------------------- Units processing times: --------------------------

Stored data scan and kernel aggregation total processing time: 0.000

Calculation units total processing time: 0.020

Total query processing time: 0.020 sec

Some of these numbers are a total surprise. I can’t really understand how we get 5541 in the second row. I would imagine that this number would be 780 * 3 = 2,340, since this is the number of the dependent cells required for Sales calculation. However, 782 is completely clear. We executed Sales formula 780 times. One time we executed Sales1, and one time we executed Revenue. As you can see, we spent an additional 0.01 seconds to execute 781 additional formulas, which means that we can execute about 80,000 formulas in a second. If our cube had about 5,000,000 input records, the calculation would take at least one minute. We probably should reconsider our decision to make Sales dynamic. At the same time, Revenue formula was executed only once, so it’s completely fine to keep it dynamic.

Conclusions

I saw multiple instances where application developers were able to achieve significant calculation improvements without compromising too much on query performance. Don’t expect to get there without any changes to your application. You will need to go through a careful design and tuning process. I believe that most BSO cubes in the world can be converted into dynamic cubes (at least partially dynamic).

Essbase Labs

Friday, May 25, 2018

Essbase Hybrid Calculation Engine

How does it work?

Performance

Correctness

Exercise

Conclusions

No comments:

Post a Comment