Pedro Zacharias — Senior Data Engineer

Experience

Senior Data Engineer In Progress

OLX Group

May 2026 — Present · 1 mo

Recently promoted to Senior Data Engineer, role just started, more to come.

Data Engineer

OLX Group

Apr 2024 — May 2026 · 2 yrs 2 mos

Part of the Data Domain Team implementing Data Mesh principles, evolving from building robust data products to developing AI-powered solutions, collaborating with Backend Engineers, Product Analysts, and Data Scientists to deliver domain-oriented, decentralized data ownership.

Built domain-oriented data products for Motors platforms serving 2M+ MAU and processing 10M+ behavioural events/day.
Reduced daily pipeline runtime from 8h to 1h and cut AWS Spectrum costs by 80% by refactoring datasets (column optimization, table decoupling, idempotent Airflow pipelines).
Operationalised Data Science workflows: refactored Python/Spark/SQL code, set up CI/CD (GitLab, Docker), provisioned infrastructure with Terraform, and integrated MLflow for reproducible tracking.
Developed and maintained low-latency APIs (AWS API Gateway, FastAPI) handling 30K+ monthly requests, with monitoring via New Relic dashboards and alerts.
Developed an internal AI agent transforming natural-language business questions into actionable answers, using LLMs, vector search, and RAG-style pipelines served through company-wide APIs and MCP tools.

Junior Data Engineer

OLX Group

Nov 2022 — Apr 2024 · 1 yr 6 mos

Part of the Data Domain Team implementing Data Mesh principles across the company — developing data pipelines, refining data models, and driving data engineering best practices to foster decentralized data ownership and domain-oriented design using AWS, Trino, and Airflow.

Developed entirely new domain-oriented data models to support vital domain analysis: modelled data from business needs, built batch-processing pipelines, and refactored unstructured processes into optimized models by decoupling large tables, making pipelines idempotent, and optimizing column partitions, distributions, and compression — reducing AWS Spectrum usage by up to 80% by syncing data from S3 to Redshift.
Assisted Analytics and Data Science teams in setting up Data Infrastructure resources, including S3, GitLab, and Airflow instances; promoted decentralized Data Ownership through Data Contracts, Data Quality alerts, Catalog Management, and governance practices.
Acted as Scrum Master for a team of 9 Data Engineers — facilitating Scrum ceremonies, organizing JIRA to team standards, implementing an Incident & Response Workflow with PagerDuty, and introducing OKR Monitoring in Jira.

Junior / Mid Data Engineer

Hiscox

May 2020 — Nov 2022

Part of the team responsible for building the European Data Ecosystem from scratch for over 400 employees using the most up-to-date cloud technologies in Azure.

Delivered critical data pipelines and reports for the Underwriting domain, helping achieve the team's objective of delivering over 50 reports in 1 year.
Implemented data pipelines in a Data Lake + Data Warehouse architecture and built Power BI reports to present the final data.

Junior Consultant

Winsig

Sep 2019 — Apr 2020

Professional internship at the leading consulting company for ERP PHC, working as a consultant.

Education

MSc Business Intelligence

NOVA IMS University

2019 — 2021

GPA 16 / 20 · Best Courses: Big Data, Final Project, Business Intelligence II

Master Thesis Developed a framework for Lisbon City Council to analyse 5 years of data on Local Cultural Policies, improving the management of their main activities. Built in collaboration with the Lisbon Municipal Directorate of Culture, based on the UNESCO Framework for Cultural Statistics (FCS). Data integration, transformation, and presentation were done using Power BI as an ETL tool and report builder. The final report is actively used by the entity to manage financial and non-financial support to cultural agents.

Grade: 18 / 20 — Published in UNL Repository ↗

BSc Business Administration

Universidade Europeia

2016 — 2019

GPA 15 / 20 · Merit scholarship for the entire degree

Best Courses: Financial Calculations, Governance Models, Management Cases, Cost Accounting, Leadership and Team Management.

Skills & Technologies

Cloud & Infra

AWS Azure Terraform K8S Docker

Data & Pipelines

Airflow Redshift Trino Spark Power BI

AI & ML

LLMs RAG PydanticAI LangGraph MLflow Qdrant Arize

Languages

Python SQL Bash

APIs & DevOps

FastAPI MCP GitLab CI New Relic

Data Governance

OpenMetadata Data Mesh dbt

Certifications

GenAI Nanodegree — Udacity

AWS Cloud Practitioner

DP-203 — Azure Data Engineer

DA-100 — Power BI

DP-900 — Azure Data Fundamentals

AZ-900 — Azure Fundamentals

Languages & Interests

Languages

Portuguese — Native English — Fluent

Interests

Travel Cooking Data Investments