Overview

A company with some Adobe Analytics data needed them in a PostgreSQL database, and they wanted regular automatic updates. Another dev had already had a crack but bailed for unknown reasons. I was given his Jupyter Notebook and in theory just had to refactor the code into a better setup, and I only had a couple evenings and a morning to pull it off. That sounded simple enough, and after a seemingly successful morning, I planned to finish the project late that evening, until I discovered that things would not be so simple, and the volume of data was too large to wait around for. At this point, I had to hand it back to the boss, as I had no more time. It proved a time sink for him also, due to oddities with the design of the pipeline resulting in an infinite loop.

Features

Pivots data into new schemas.
Runs automatically on a server.
Errors introduce breaking changes when appropriate, so that they are not baked in.

Tools Involved

Python
PostgreSQL database
Jupyter Notebook
Singer.io
JSON schema
YAML and setup files
Git
Environnemental variables
WSL
Google Sheets API
A virtual environment

The Steps

Identifying key endpoints.
Scoping out the details of the API.
Testing the Jupyter Notebook
Refactoring the code.
Writing the schemas and loops.
Testing.
Code cleanup.

Adobe Analytics to Postgres Pipeline

Overview

Features

Tools Involved

The Steps

More projects

Grappling HQ

Spitfire Records

Pipeline Google Sheets to PostgreSQL

What Is Truth