Data Vault look like it is not different from any other "hybrid" DW that mixes 3NF (CIF - Inmon) and dimensional (BUS - Kimbal) designs. The DV is a hybrid approach that is carefully crafted to create maximum synergy between ETL, Modelling and Architecture. In this sense it is a great kick-off for any DWH. Most approaches are either very flexible and generic (BUS or CIF) or very specific and rigid. DV tries to find the middle ground. While the technical data model can be described as a hybrid between star and 3/6NF, the approach and ETL is specific for DV. What DV has in common with Kimball is the SCD-2 approach. Actually, they both rely on a basic approach called 'stacking' which DV does always and BUS just for SCD-2. The DV Hubs are related to the Anchors available in 6NF.
Positive Point about Data Vault: It is faster to jump into development mode adopting Data Vault than defining your own architecture rules based on CIF or BUS or both.Negative Point about Data Vault: It implies a bunch of architecture decisions that may not suit all enterprises jepardizing the success of the BI / DW program.
Probable reply by Dan posted by Ram Iyer,
1) I like to know more about it from early adopter before talking to someone. I thought that you may have some good ideas to share with me on DV pros and cons.
I'm always willing to help people understand DataVault correctly by pointing them to resources or helping them directly. The best place for questions is www.datavaultinstitute.com/ As one of the principal contributors i'll try to answer all questions to DataVault, together with the other DV specialists there.
2) How do we know, which customer fit for DV which one not?
Using a list of Architecture principles derived from the customers mission statements and requirments. If they state things like, 100% Auditable/trackable (source) data (through time), Flexible, incremental build, ODS like functionality and working without a fixed business information model are sure signs of the usablity of DataVault. Huge volumes of data, Real Time loading and extensive data mining are also a plus for DataVault.
3) How does the customer know, DV serve them better?
The strengths of DV put againts there requirments. DV's unique selling points should be clearly linked to their requirments.
4) Are there any standard evaluation spec/criteria for DV?
Could you clarify?
5) What are the drivers define success in your implementations (examples welcome)?
The flexibility,auditability and history are the most important. Scalability does not play an important role.
6) What are the pain creating events in our DW/BI community buy into DV?
-Flexibility: Incremental development -History-Source preserving-limited automation
7) What are the reasons DV succeed compare to others?
-Scalability
-Flexibility: Incremental development
-History
-Auditable
-Automating DV development and change management
-Resilent/Fault tolerant
8) I learned about DW 2.0 and DV linkage. Do you have some white paper or doc on this?
No there are no specific docs on this, there is coverage on this online at the online training at inmoninstitute.com. The DV training and online material only reference this briefly.
9) There are some automating data ware house development tools. Does any such tools use DV?
BIready internally uses a derivation of the DV model, but BIready is a Business Vault and not a DataVault. You can create and automate a DV based DWH with almost any ETL/modelling tool, but there are toolsets in development that adress DV specifically.
10) Does DV support/recommend big bang or incremental development approach?
DV works better on an incremental approach than almost any other on a central EDW. It is a best practice in DV, but not required.
11) What are the data warehouse sizes (small/medium/Large) fit DV implementations?
DV scales extremely well, into the Petabyte range. Small implementations will not benefit as much from the ETL scaling possibilities as large ones.
12) Do you have suggestion to build matrix/measures to arrive time-line/estimation for DV based implementation?
No, there is one, that is used by DV creator Dan linstedt. It is the basis for the DV project management course, but I have not attended that class yet.
13) Do you allow the end/business to query/report on DV layer directly?
No, but good views or semantical layers can be built directly on the DV for end queries.
14) If yes, how did they learned/trained on Hub and satellite structures?
Usually you hide this in views/semantical layer that is 3NF/3NF+time/star-schemas or hybrid
15) As DV adopts Normalized structures with integrity constraints etc, it needs extensive joins on direct queries of DV layer. How does it impact on performance with large volumes of data in this situation?
You need to optimzie join performance, but you can also materialize joins when required. DV works well with columnar databases like teradata.
Ghosh, concludes - Judging from the feedback, it appears that the DV isn't directly queryable, and one must design a set of queryable objects on top of it. A Star Schema approach is directly queryable.
I am getting the sense that the DV is more of an optimal storage mechanism for detailed, historical transactional data. You still need to transform it to get the business view. It seems to lean closer to the CIF approach in that respect - model the data in the enterprise first, then design for queries.
I am getting the sense that the DV is more of an optimal storage mechanism for detailed, historical transactional data. You still need to transform it to get the business view. It seems to lean closer to the CIF approach in that respect - model the data in the enterprise first, then design for queries.