Ayadi Tahar

[draft] What is a Data Lake ?

Publish Date: 2022-10-17

A data lake is a central environment or repository where you can hold and store all your data in its native format, until needed for analytic applications - from dashboards and visualizations to big data processing, real-time analytics, and machine learning, in order to guide for better decisions...

Realtime Analysis using Spark Streaming and Kafka

Publish Date: 2022-10-09

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with...

Apache Spark and Cassandra : Best NoSQL Big Data combination

Publish Date: 2022-10-07

Apache Spark™ is considered as the most powerful engine for executing data engineering, data science, and machine learning on single-node machines or clusters over diverse data sources such as NoSQL databases.

Apache read more

How to use and Implement a Stack ?

Publish Date: 2022-10-04

A stack represents a sequence of objects or elements in a linear data structure format and is based on the principle of Last In First Out (LIFO).

It is commonly used as an abstract data type with two major operations namely push and pop, which are carried out on the topmost el...

Read/Write data from/to PostgreSQL tables using Spark

Publish Date: 2022-10-02

Apache Spark is a fast and general computing engine used for big data processing. It can process data from different sources and formats, one of them is data from relational databases like PostgreSQL.

In our article, we will show you the steps to on how to read data from PostgreSQL t...

[draft] What is a Data Lake ?

Realtime Analysis using Spark Streaming and Kafka

Apache Spark and Cassandra : Best NoSQL Big Data combination

How to use and Implement a Stack ?

Read/Write data from/to PostgreSQL tables using Spark

Data Structures

Data Engineering

Algorithms

Data Science

Linux

Deploy Minio in Openshift

Run Ansible playbooks Using Red Hat Satellite

Controlling pod placement onto nodes in OpenShift

Create a storage class for NFS dynamic storage provisioning in OpenShift

Windows management with Ansible

Running Spark on Kubernetes with AKS

Data Lakehouse, The Best of Both Worlds

CAP Theorem, Does it still hold in modern days

Pandas API over Pyspark