Home

Angola Economic ETL Pipeline

PythonPandasSQLiteSQLpytestWorld Bank API

Overview

A production-style ETL pipeline that extracts macroeconomic data for Angola from the World Bank Open Data API, transforms it with Pandas, and loads it incrementally into a SQLite data warehouse — complete with analytical SQL views and automated data quality checks.

Built as a portfolio project to demonstrate hands-on Data Engineering skills: ETL design, SQL modelling, Python automation, and data governance.

Angola Economic ETL Pipeline preview

Key Features

  • Extracts 8 macroeconomic indicators: GDP, inflation, current account balance, FDI, unemployment, GDP per capita, exports, and imports
  • Pandas-based transformer with type casting, deduplication, and anomaly flagging
  • Incremental loading with upsert logic — no duplicate records on re-runs
  • 4 analytical SQL views: latest values, year-on-year growth, anomalies, and indicator summaries
  • 6 automated data quality checks: completeness, freshness, anomalies, gap detection, and more
  • JSON quality reports generated after every pipeline run
  • pytest test suite for the transformer layer
  • CLI support for custom indicator selection

Technical Approach

The pipeline follows a classic ETL architecture with four distinct stages: Extract (World Bank API client with retry and pagination), Transform (Pandas cleaning, normalization, anomaly detection), Load (SQLite upsert with schema DDL and SQL views), and Quality Check (automated data governance).

The Extractor uses the requests library with built-in retry logic and pagination to reliably pull data from the World Bank API. The Transformer applies type casting, deduplication, and statistical anomaly detection using Pandas. The Loader implements an upsert pattern to support incremental loads without data duplication.

SQL views provide pre-built analytical queries for common business questions — GDP trends, inflation tracking, and cross-indicator comparisons. The quality checker validates completeness, freshness, and data integrity after every load.

Outcomes

  • Clean, modular codebase following production ETL best practices
  • Portable SQL patterns designed for easy migration to SQL Server or PostgreSQL
  • Zero-dependency setup — SQLite requires no external database server
  • MIT licensed and publicly available as an educational resource