Shrinkage! The word invokes memories of a particularly infamous Seinfeld episode. If you do not recall the TV show, all I can tell you is that “shrinkage” was not good for poor George. However, in the context of QlikView and Qlik Sense, shrinkage can help speed up development and unit testing. As such, I’ve created the QVD ShrinQer (pronounced “Shrinker”).
In my 11 years as part of the Qlik Consulting team, I’ve seen many customer sites where large data volumes slow down developers; testing a script change often requires several minutes, due to large QVDs that are processed by the load script. (Here we are assuming that a QVD layer is in use. If you’re not familiar with the concept of Extract and Transform QVD layers, please see this article).
Sub-setting QVD data facilitates rapid prototyping; 20% (or sometimes even 10% or 5%) sized-QVDs are quite useful in reducing load script execution times and thereby speeding up development & unit test cycles. However, one should exercise care here; if the QVDs are reduced too much, one could unknowingly create inefficient data models and expressions that would negatively impact end user performance. As such, you want the shrunken QVDs to be small enough for efficient development work, but not so small as to be misleadingly fast.
Ultimately, having a set of small QVDs is beneficial to performance in the long term; small QVDs allow developers to rapidly experiment with different designs and find options that are more performant, with the full data set as well as a subset of the data. Of course, final application & system testing should always be run with the full data set, so that test results will show the performance that end users will see. And developers should conduct periodic unit performance tests with the full data set, so that final testing does not reveal any surprises.
The QVD ShrinQer facilitates rapid development and unit testing by creating a set of reduced QVDs that exhibit relational integrity when linked in a data model. QVDs can be reduced to either:
- A percentage of the original size by record sampling, or
- A sub-set of records that have a match in another table. We arbitrarily refer to this other table as the “parent” table. (This naming convention is described further in the terminology note below).
Shrinkage rules are defined in an Excel file that is external to the QVD ShrinQer QVW (for QlikView) or QVF (for Qlik Sense); all of these files, as well as sample input QVDs, are included in the Utilities folder of the ZIP file for the latest TSEEQ release.
Consider the following table structure:
We might wish to keep 10% of records from the Customers table (the parent table in this example), as well as keep all Orders associated to that sub-set of Customers.
To achieve that goal, here are the rules we define for the QVD ShrinQer:
Table | Reduction Type | Parent Table | Percentage To Keep |
---|---|---|---|
Customers | PERCENTAGE | N/A | 10% |
Orders | KEEP_MATCHES_TO_PARENT | Orders | N/A |
In a slightly more complex case, the key in each of the QVDs is not commonly named:
Therefore, the rules for the QVD ShrinQer specify the name of the key field for each table as well as the table names:
Table | Reduction Type | Parent Table | Key Field Parent | Key Field Child | Percentage To Keep |
---|---|---|---|---|---|
SalesReps | PERCENTAGE | N/A | N/A | N/A | 50% |
Commissions | KEEP_MATCHES_TO_PARENT | SalesReps | SREP_ID | RepId | N/A |
Terminology note: the QVD ShrinQer “parent” concept is not [always] the same as the “parent” in a parent-child relationship. It’s difficult to choose a good word here that is not already used in some other context in data management. I considered using “master” instead of “parent”, but “master” is also an overloaded term. “Primary” would be another potentially confusing term, since the ShrinQer’s “parent” table does not have to contain the primary key in the strict relational modeling sense. In fact, one could reverse the first example from above, such that the table containing the foreign key is the parent. To reverse the first example, we would define rules as follows, with the Orders table as the new parent to the Customers table:
Table | Reduction Type | Parent Table | Percentage To Keep |
---|---|---|---|
Orders | PERCENTAGE | N/A | 10% |
Customers | KEEP_MATCHES_TO_PARENT | Orders | N/A |
Lastly, a child in one shrinkage rule can be the parent in another. For example, if we would like to keep 10% of Customer records, and all associated Orders, and also associated OrderDetails, then Orders can be both the child of Customer and the parent of OrderDetails.
Table | Reduction Type | Parent Table | Percentage To Keep |
---|---|---|---|
Customers | Percentage | N/A | 10% |
Orders | KEEP_MATCHES_TO_PARENT | Customers | N/A |
OrderDetails | KEEP_MATCHES_TO_PARENT | Orders | N/A |
To help visualize this, here are the three tables linked in a data model:
Please feel free to download the TSEEQ ZIP file and experiment with the example included therein under the Utilities\QVD ShrinQer folder, and also extend the example to shrink your own QVDs. Happy shrinQing!