What Is a Data Warehouse? Warehousing Data, Data Mining Explained

What Is a Data Warehouse?

A data warehouse is the secure electronic storage of information by a business or other organization. The goal of a data warehouse is to create a trove of historical data that can be retrieved and analyzed to provide useful insight into the organization's operations.

A data warehouse is a vital component of business intelligence. That wider term encompasses the information infrastructure that modern businesses use to track their past successes and failures and inform their decisions for the future.

Key Takeaways

  • A data warehouse is the storage of information over time by a business or other organization.
  • New data is periodically added by people in various key departments such as marketing and sales.
  • The warehouse becomes a library of historical data that can be retrieved and analyzed in order to inform decision-making in the business.
  • The key factors in building an effective data warehouse include defining the information that is critical to the organization and identifying the sources of the information.
  • A database is designed to supply real-time information. A data warehouse is designed as an archive of historical information.

How a Data Warehouse Works

The need to warehouse data evolved as businesses began relying on computer systems to create, file, and retrieve important business documents. The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy.

Data warehousing is designed to enable the analysis of historical data. Comparing data consolidated from multiple heterogeneous sources can provide insight into the performance of a company. A data warehouse is designed to allow its users to run queries and analyses on historical data derived from transactional sources.

Data added to the warehouse does not change and cannot be altered. The warehouse is the source that is used to run analytics on past events, with a focus on changes over time. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve, and easy to manage.

Maintaining a Data Warehouse

There are certain steps that are taken to maintain a data warehouse. One step is data extraction, which involves gathering large amounts of data from multiple source points. After a set of data has been compiled, it goes through data cleaning, the process of combing through it for errors and correcting or excluding any that are found.

The cleaned-up data is then converted from a database format to a warehouse format. Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use. Over time, more data is added to the warehouse as the various data sources are updated.

A key book on data warehousing is W. H. Inmon's Building the Data Warehouse, a practical guide that was first published in 1990 and has been reprinted several times.

Today, businesses can invest in cloud-based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.

Data Mining

Businesses warehouse data primarily for data mining. That involves looking for patterns of information that will help them improve their business processes.

A good data warehousing system makes it easier for different departments within a company to access each other's data. For example, a marketing team can assess the sales team's data in order to make decisions about how to adjust their sales campaigns.

The 5 Steps of Data Mining

The data mining process breaks down into five steps:

  1. An organization collects data and loads it into a data warehouse.
  2. The data are then stored and managed, either on in-house servers or in a cloud service.
  3. Business analysts, management teams, and information technology professionals access and organize the data.
  4. Application software sorts the data.
  5. The end-user presents the data in an easy-to-share format, such as a graph or table.

What is Data Mining?

The concept of the data warehouse was introduced by two IBM researchers in 1988.

Data Warehouse Architecture

Designing a data warehouse is known as data warehouse architecture and depending on the needs of the data warehouse, can come in a variety of tiers. Typically there are tier one, tier two, and tier three architecture designs.

Single-tier Architecture: Single-tier architecture is hardly used in the creation of data warehouses for real-time systems. They are often used for batch and real-time processing to process operational data. A single-tier design is composed of a single layer of hardware with the goal of keeping data space at a minimum.

Two-tier Architecture: In a two-tier architecture design, the analytical process is separated from the business process. The point of this is to increase levels of control and efficiency.

Three-tier Architecture: A three-tier architecture design has a top, middle, and bottom tier; these are known as the source layer, the reconciled layer, and the data warehouse layer. This design is suited for systems with long life cycles. When changes are made in the data, an extra layer of review and analysis of the data is completed to ensure there have been no errors.

Regardless of the tier, all data warehouse architectures must meet the same five properties: separation, scalability, extensibility, security, and administrability.

Data Warehouse vs. Database

A data warehouse is not the same as a database:

  • A database is a transactional system that monitors and updates real-time data in order to have only the most recent data available.
  • A data warehouse is programmed to aggregate structured data over time.

For example, a database might only have the most recent address of a customer, while a data warehouse might have all the addresses of the customer for the past 10 years.

Data mining relies on the data warehouse. The data in the warehouse is sifted for insights into the business over time.

Data Warehouse vs. Data Lake

Both data warehouses and data lakes hold data for a variety of needs. The primary difference is that a data lake holds raw data of which the goal has not yet been determined. A data warehouse, on the other hand, holds refined data that has been filtered to be used for a specific purpose.

Data lakes are primarily used by data scientists while data warehouses are most often used by business professionals. Data lakes are also more easily accessible and easier to update while data warehouses are more structured and any changes are more costly.

Data Warehouse vs. Data Mart

A data mart is just a smaller version of a data warehouse. A data mart collects data from a small number of sources and focuses on one subject area. Data marts are faster and easier to use than data warehouses.

Data marts typically function as a subset of a data warehouse to focus on one area for analytical purposes, such as a specific department within an organization. Data marts are used to help make business decisions by helping with analysis and reporting.

Advantages and Disadvantages of Data Warehouses

A data warehouse is intended to give a company a competitive advantage. It creates a resource of pertinent information that can be tracked over time and analyzed in order to help a business make more informed decisions.

It also can drain company resources and burden its current staff with routine tasks intended to feed the warehouse machine. Some other disadvantages include the following:

  • It takes considerable time and effort to create and maintain the warehouse.
  • Gaps in information, caused by human error, can take years to surface, damaging the integrity and usefulness of the information.
  • When multiple sources are used, inconsistencies between them can cause information losses.
  • Provides fact-based analysis on past company performance to inform decision-making.

  • Serves as a historical archive of relevant data.

  • Can be shared across key departments for maximum usefulness.

  • Creating and maintaining the warehouse is resource-heavy.

  • Input errors can damage the integrity of the information archived.

  • Use of multiple sources can cause inconsistencies in the data.

What Is a Data Warehouse and What Is It Used for?

A data warehouse is an information storage system for historical data that can be analyzed in numerous ways. Companies and other organizations draw on the data warehouse to gain insight into past performance and plan improvements to their operations.

What Is a Data Warehouse Example?

Consider a company that makes exercise equipment. Its best seller is a stationary bicycle, and it is considering expanding its line and launching a new marketing campaign to support it.

It goes to its data warehouse to understand its current customer better. It can find out whether its customers are predominantly women over 50 or men under 35. It can learn more about the retailers that have been most successful in selling their bikes, and where they're located. It might be able to access in-house survey results and find out what their past customers have liked and disliked about their products.

All of this information helps the company to decide what kind of new model bicycles they want to build and how they will market and advertise them. It's hard information rather than seat-of-the-pants decision-making.

What Are the Stages of Creating a Data Warehouse?

There are at least seven stages to the creation of a data warehouse, according to ITPro Today, an industry publication. They include:

  • Determining the business objectives and its key performance indicators.
  • Collecting and analyzing the appropriate information.
  • Identifying the core business processes that contribute the key data.
  • Constructing a conceptual data model that shows how the data are displayed to the end-user.
  • Locating the sources of the data and establishing a process for feeding data into the warehouse.
  • Establish a tracking duration. Data warehouses can become unwieldy. Many are built with levels of archiving, so that older information is retained in less detail.
  • Implementing the plan.

Is SQL a Data Warehouse?

SQL, or Structured Query Language, is a computer language that is used to interact with a database in terms that it can understand and respond to. It contains a number of commands such as "select," "insert," and "update." It is the standard language for relational database management systems.

A database is not the same as a data warehouse, although both are stores of information. A database is an organized collection of information. A data warehouse is an information archive that is continuously built from multiple sources.

What Is ETL in a Data Warehouse?

"ETL" stands for "extract, transform, and load." ETL is a data process that combines data from multiple sources into one single data storage unit, which is then loaded into a data warehouse or similar data system. It is used in data analytics and machine learning.

The Bottom Line

The data warehouse is a company's repository of information about its business and how it has performed over time. Created with input from employees in each of its key departments, it is the source for analysis that reveals the company's past successes and failures and informs its decision-making.

Article Sources
Investopedia requires writers to use primary sources to support their work. These include white papers, government data, original reporting, and interviews with industry experts. We also reference original research from other reputable publishers where appropriate. You can learn more about the standards we follow in producing accurate, unbiased content in our editorial policy.
  1. WayBack Machine: ComputerWorld. "The Story So Far."

  2. Amazon. "Building the Data Warehouse."

  3. G2. "Best Data Warehouse Software."

  4. Dataversity. "A Short History of Data Warehousing."

  5. IT Pro Today. "7 Steps to Data Warehousing."

  6. SQL Course. "What Is SQL?"

  7. Xplenty. "Data Warehouse vs. Database: 7 Key Differences."

Take the Next Step to Invest
The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.