Data vault modeling

From Simple English Wikipedia, the free encyclopedia

Data vault modeling is a database modeling method that tries to preserve different sets of historical data from different sources. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change.

Data Vault Modeling focuses on several things. First, it emphasizes the need to trace where all the data in the database came from. Each row has extra attributes that describe where the data came from, and at what time it was loaded. This feature lets auditors find the source of the values.

Data vault modeling does not distinguish between data that conforms to the business rules and data that does not. The data that does not conform is generally called "bad data". Dan Linstedt, who developed the technology, said that a data vault modeling stores "a single version of the facts". In other data warehouse modeling schemes, the data warehouse generally stores "a single version of the truth", data that does not conform to the business rules is removed.

Because data vault modeling stores the source of the data separately from the data itself, it can cope with change in the business environment. It does this by explicitly separating structural information from descriptive attributes.

Finally, Data Vault is designed to enable parallel loading as much as possible, so that very large implementations can scale out without the need for major redesign.