What Is Data Warehousing?
Data warehousing is the secure electronic storage of information by a business or other organization. The goal of data warehousing is to create a trove of historical data that can be retrieved and analyzed to provide useful insight into the organization's operations.
Data warehousing is a vital component of business intelligence. That wider term encompasses the information infrastructure that modern businesses use to track their past successes and failures and inform their decisions for the future.
- Data warehousing is the storage of information over time by a business or other organization.
- New data is periodically added by people in various key departments such as marketing and sales.
- The warehouse becomes a library of historical data that can be retrieved and analyzed in order to inform decision-making in the business.
- The key factors in building an effective data warehouse include defining the information that is critical to the organization and identifying the sources of the information.
- A database is designed to supply real-time information. A data warehouse is designed as an archive of historical information.
How Data Warehousing Works
The need to warehouse data evolved as businesses began relying on computer systems to create, file, and retrieve important business documents. The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy.
Data warehousing is designed to enable the analysis of historical data. Comparing data consolidated from multiple heterogeneous sources can provide insight into the performance of a company. A data warehouse is designed to allow its users to run queries and analyses on historical data derived from transactional sources.
Data added to the warehouse do not change and cannot be altered. The warehouse is the source that is used to run analytics on past events, with a focus on changes over time. Warehoused data must be stored in a manner that is secure, reliable, easy to retrieve, and easy to manage.
Maintaining the Data Warehouse
There are certain steps that are taken to maintain a data warehouse. One step is data extraction, which involves gathering large amounts of data from multiple source points. After a set of data has been compiled, it goes through data cleaning, the process of combing through it for errors and correcting or excluding any that are found.
The cleaned-up data are then converted from a database format to a warehouse format. Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use. Over time, more data are added to the warehouse as the various data sources are updated.
A key book on data warehousing is W. H. Inmon's "Building the Data Warehouse," a practical guide that was first published in 1990 and has been reprinted several times.
Today, businesses can invest in cloud-based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.
What is Data Mining?
Businesses warehouse data primarily for data mining. That involves looking for patterns of information that will help them improve their business processes.
A good data warehousing system makes it easier for different departments within a company to access each other's data. For example, a marketing team can assess the sales team's data in order to make decisions about how to adjust their sales campaigns.
The 5 Steps of Data Mining
The data mining process breaks down into five steps:
- An organization collects data and loads it into a data warehouse.
- The data are then stored and managed, either on in-house servers or in a cloud service.
- Business analysts, management teams, and information technology professionals access and organize the data.
- Application software sorts the data.
- The end-user presents the data in an easy-to-share format, such as a graph or table.
The concept of the data warehouse was introduced by two IBM researchers in 1988.
Data Warehousing vs. Databases
A data warehouse is not the same as a database:
- A database is a transactional system that monitors and updates real-time data in order to have only the most recent data available.
- A data warehouse is programmed to aggregate structured data over time.
For example, a database might only have the most recent address of a customer, while a data warehouse might have all the addresses for the customer for the past 10 years.
Data mining relies on the data warehouse. The data in the warehouse are sifted for insights into the business over time.
Advantages and Disadvantages of Data Warehouses
Data warehousing is intended to give a company a competitive advantage. It creates a resource of pertinent information that can be tracked over time and analyzed in order to help a business make more informed decisions.
It also can drain company resources and burden its current staff with routine tasks intended to feed the warehouse machine.
The Corporate Finance Institute identifies these potential disadvantages of maintaining a data warehouse:
- It takes considerable time and effort to create and maintain the warehouse.
- Gaps in information, caused by human error, can take years to surface, damaging the integrity and usefulness of the information.
- When multiple sources are used, inconsistencies between them can cause information losses..
Provides fact-based analysis on past company performance to inform decision-making.
Serves as a historical archive of relevant data.
Can be shared across key departments for maximum usefulness.
Creating and maintaining the warehouse is resource-heavy.
Input errors can damage the integrity of the information archived.
Use of multiple sources can cause inconsistencies in the data.
Data Warehouse FAQs
Here are the answers to some commonly-asked questions about data warehousing.
What Is a Data Warehouse and What Is It Used For?
A data warehouse is an information storage system for historical data that can be analyzed in numerous ways. Companies and other organizations draw on the data warehouse to gain insight into past performance and plan improvements to their operations.
What Is a Data Warehouse Example?
Consider a company that makes exercise equipment. Its best-seller is a stationary bicycle, and it is considering expanding its line and launching a new marketing campaign to support it.
It goes to its data warehouse to understand its current customer better. It can find out whether its customers are predominantly women over 50 or men under 35. It can learn more about the retailers that have been most successful in selling their bikes, and where they're located. It might be able to access in-house survey results and find out what their past customers have liked and disliked about their products.
All of this information helps the company to decide what kind of new model bicycles they want to build and how they will market and advertise them. It's hard information rather than seat-of-the-pants decision-making.
What Are the Stages of Data Warehousing?
There are at least seven stages to the creation of a data warehouse, according to ITPro Today, an industry publication. They include:
- Determining the business objectives and its key performance indicators.
- Collecting and analyzing the appropriate information.
- Identifying the core business processes that contribute the key data.
- Constructing a conceptual data model that shows how the data are displayed to the end-user.
- Locating the sources of the data and establishing a process for feeding data into the warehouse.
- Establish a tracking duration. Data warehouses can become unwieldy. Many are built with levels of archiving, so that older information is retained in less detail.
- Implementing the plan.
Is SQL a Data Warehouse?
SQL, or Structured Query Language, is a computer language that is used to interact with a database in terms that it can understand and respond to. It contains a number of commands such as "select," "insert," and "update." It is the standard language for relational database management systems.
A database is not the same as a data warehouse, although both are stores of information. A database is an organized collection of information. A data warehouse is an information archive that is continuously built from multiple sources.
The Bottom Line
The data warehouse is a company's repository of information about its business and how it has performed over time. Created with input from employees in each of its key departments, it is the source for analysis that reveals the company's past successes and failures and informs its decision-making.