Ayadi Tahar | What is a Data Warehouse?

What is a Data Warehouse?

Publish Date: 2022-01-30


The expansion of big data and the application of new digital technologies are driving change in the way we are looking and using data to respond to different business needs.

Traditional databases are built primarily for fast queries, transaction processing and updating real-time data. Typically serves as the focused data store for a specific application, it can’t store different types of data, and has limited analytics capabilities.

When multiple applications involve as the business grows, the need for a solution that offers enterprises the ability to extract even greater value from all their data, while lowering costs and improving reliability and performance become a necessary and a must choice.

The data warehouse can respond to these new requirements, and offer advanced types of analysis in ways that a standard database cannot.

1. Definition

a data warehouse is considered to be a system and environment to manage and store a large amount of data from different sources. It’s built for data analysis and reporting which is the main core of business intelligence (BI).

We can consider a data warehouse as a “single source of truth” because it gives easy and rapid access to data to executives for planning and control, and it stores a large amount of historical data which is invaluable for analysts and can improve decision making.

Bill Inmon Inmon
Bill Inmon

Bill Inmon, the recognized father of the data warehousing concept, defines a data warehouse as a subject-orientated, integrated, time-variant, non-volatile collection of data in support of management's decision-making process (Lambert 1996)

2. Stages for Designing a Data Warehouse

There are several stages to the creation of a data warehouse, to avoid the risk associated with a Data warehouse implementation, we start by defining the specific business requirements, agreeing on the scope, and drafting a conceptual design which includes a logical design that involves the relationships between the objects, and the physical design involves the best way to store and retrieve the objects to respond to needs of the end-users.

Any data warehouse design must address the following:

  • Location and Collection of the appropriate information and determine Specific data content needed
  • creating conceptual models and Relationships within and between groups of data.
  • The systems environment that will support the data warehouse with billing consideration.
  • The types of data transformations required (data mapping) and built-in levels of archiving are needed.
  • plan to test the consistency, accuracy, and integrity of the data (Data refresh frequency )
  • maintain data warehouse architecture and keep pace with the evolving needs of end-users.
  • Automated management for simple provisioning, scaling, and administration.

3 - Components of data warehouse architecture.

The architecture of a data warehouse is determined by the organization’s specific needs. All data warehouses share a basic design, but common typical architectures include the following stages or elements:

Data Sources:

includes possible internal and external data sources needed, which can be structured, semi-structured or unstructured, with different formats (CSV, JSON, XML, JDBC, parquet...) and from transnational and relational databases (RDBMS), Files, spreadsheets according to business needed.

Staging Area:

data should be processed and cleaned before being put in the warehouse. to simplify data preparation, the process of Extraction, Transformation, and Loading (ETL) ensure the extraction of data from different sources, transform it into an acceptably format for mapping and further manipulation (cleansing, scrubbing and duplication ), then loaded to the central repository. Mostly this step can be done programmatically.

data warehouse architecture
Data Warehouse Architecture
Central repository:

The repository is fed by data sources of different types of metadata, summary data, and raw data and are stored for a long duration (years) in an organized way,

Its physical implementation is different from one to another and may involve many vendors and enterprise-grade products.

Once stored in the warehouse, the data goes through sorting, consolidating, and summarizing, so that it will be easier to use, and ensure its consistency.

Over time, more data are added to the warehouse as the various data sources are updated.

Data marts:

When the data is ready for use, it is moved to the appropriate data mart. A data mart performs the same functions as a data warehouse but within a much more limited scope—usually a single department ( such as sales, finance) or line of business (by product, project).

This makes data marts easier to establish than data warehouses. However, they tend to introduce inconsistency because it can be difficult to uniformly manage and control data across numerous data marts, so it is always used interchangeably with a data warehouse.

Query Manager and Analysis:

the last step where Statistical analysis, reporting, and data mining capabilities are performed by end-users. more sophisticated analytical applications are used at this step to generate actionable information by applying data science and artificial intelligence (AI) algorithms, or graph and spatial features that enable more kinds of analysis of data at scale.

4. Advantages and Disadvantages of DWH

Advantages of Data Warehouse

Data warehouse is a first step If you want to discover ‘hidden patterns’ of data-flows and groupings, as it provides consistent information on various cross-functional activities such as ad-hoc query and reporting, data mining, artificial intelligence, and machine learning.

Data warehousing is intended to give a company a competitive advantage. It creates a resource of pertinent information that can be tracked over time and analyzed to help a business make more informed decisions.

Data warehouse allows users to access critical data from several sources in a single place and shared it across key departments for maximum usefulness, at it also reduces stress on the production system. Therefore, it saves users time in retrieving data from multiple sources and Provides fact-based analysis on past company performance.

Disadvantages of Data Warehouse:

We can identify the potential disadvantages of maintaining a data warehouse as:

Creation and Implementation of Data Warehouse is surely time confusing affair. At the same time can be outdated relatively quickly and become a big garbage collection if it is not designed carefully.

The data warehouses may seem easy, but actually, it is too complex for the average users.

Sometimes warehouse users will develop different business rules. And despite best efforts at project management, data warehousing project scope will always increase.

Input errors can take longer to surface, which can damage the integrity and usefulness of the information.

The use of multiple sources can cause inconsistencies in the data and information losses.

As the size of the databases grows, It becomes more complex to build, maintain and run data warehouse systems which are always increasing in size, time-consuming and resource-heavy.

Change in Regulatory constraints may limit the ability to combine sources of disparate data, which will introduce difficulties in making changes in data types and ranges, data source schema, indexes, and queries.

5. What is a Cloud Data Warehouse?

Data warehousing systems have been a part of business intelligence (BI) solutions for over three decades, but they have evolved recently with the emergence of new data types and data hosting methods.

These on-premises data warehouses continue to have many advantages today. In many cases, they can offer improved governance, security, data sovereignty, and better latency. However, on-premises data warehouses are not as elastic and they require complex forecasting to determine how to scale the data warehouse for future needs. Managing these data warehouses can also be very complex.

Traditionally, a data warehouse was hosted on-premises, and its functionality was focused on extracting data from other sources, cleansing and preparing the data, and loading and maintaining the data in relational databases.

More recently, a data warehouse might be hosted on a dedicated appliance or in the cloud, and most data warehouses have added analytics capabilities and data visualization and presentation tools.

A cloud data warehouse uses the cloud to ingest and store data from disparate data sources. It offers the same characteristics and benefits of on-premises data warehouses but with the added benefits of cloud computing―such as flexibility, scalability, agility, security, and reduced costs.

Cloud data warehouses allow enterprises to focus solely on extracting value from their data rather than having to build and manage the hardware and software infrastructure to support the data warehouse.

Today, businesses can invest in cloud-based data warehouse software services from companies including Microsoft, Google, Amazon, and Oracle, among others.

6. Final thoughts

A data warehouse (also known as an Enterprise Data Warehouse (EDW)) is an information archive that is continuously built from multiple sources.

The data warehouse is a company's repository of information about its business and how it has performed over time.

Data warehouse is used in diverse industries like Airline, Banking, Healthcare, Insurance, Retail etc.

Data warehouse allows business users to quickly access critical data from some sources all in one place.

Data added to the warehouse do not change and cannot be altered. it stored in a manner that is secure, reliable, easy to retrieve, and easy to manage.

Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits.

Cloud-based data warehouses have grown more popular over the last five to seven years as more companies use cloud services and seek to reduce their on-premises data centre footprint.