What Is Data Warehousing?
Data warehousing is the electronic storage of a large amount of information by a business or organization. Data warehousing is a vital component of business intelligence that employs analytical techniques on business data.
The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy. The need to warehouse data evolved as computer systems became more complex and handled increasing amounts of data. A key book on data warehousing is W. H. Inmon's "Building the Data Warehouse," which was first published in 1990 and has been reprinted several times since.
How Data Warehousing Works
Data warehousing is used to provide greater insight into the performance of a company by comparing data consolidated from multiple heterogeneous sources. A data warehouse is designed to run query and analysis on historical data derived from transactional sources.
Once the data has been incorporated into the warehouse, it does not change and cannot be altered since a data warehouse runs analytics on events that have already occurred by focusing on the changes in data over time. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve and easy to manage.
There are certain steps that are taken to create a data warehouse. The first step is data extraction, which involves gathering large amounts of data from multiple source points. After the data has been compiled, it goes through data cleaning, the process of combing through the data for errors and correcting or excluding any errors found.
The cleaned-up data is then converted from a database format to a warehouse format. Once it’s stored in the warehouse, the data goes through sorting, consolidating, summarizing, etc. so that it’s more coordinated and easier to use. Over time, more data is added to the warehouse as the multiple data sources are updated.
- Data warehousing is the electronic storage of a large amount of information by a business or organization.
- A data warehouse is designed to run query and analysis on historical data derived from transactional sources for business intelligence and data mining purposes.
- Data warehousing is used to provide greater insight into the performance of a company by comparing data consolidated from multiple heterogeneous sources.
Special Considerations: Data Mining
Businesses might warehouse data for use in exploration and data mining, looking for patterns of information that will help them improve their business processes. A good data warehousing system can also make it easier for different departments within a company to access each other's data.
For example, a data warehouse might allow a company to easily assess the sales team's data and help to make decisions about how to improve sales or streamline the department. The business might choose to focus on its customers’ spending habits to better position its products and increase sales.
With data warehousing, the company can gather historical data of its customers’ spending over the past—say, 20 years—and run analytics on this data. The resulting information could provide insight into the preferences of its consumers; the time of day, month, or year with greater sales; or highest spending customer for the year.
Effective data storage and management are also what makes processes, such as initiating travel reservations and using automated teller machines possible.
The data mining process breaks down into five steps:
- Organizations collect data and load it into their data warehouses.
- They then store and manage the data, either on in-house servers or the cloud.
- Business analysts, management teams and information technology professionals access the data and determine how they want to organize it.
- Application software then sorts the data based on the user's results
- The end-user finally presents the data in an easy-to-share format, such as a graph or table.
Data Warehousing vs. Databases
A data warehouse is not necessarily the same concept as a standard database. A database is a transactional system that is set to monitor and update real-time data in order to have only the most recent data available. A data warehouse is programmed to aggregate structured data over a period of time. For example, a database might only have the most recent address of a customer, while a data warehouse might have all the addresses that the customer has lived in for the past 10 years.