Adobe Analytics to Postgres Pipeline

Adobe Analytics to Postgres Pipeline


Overview

A company with some Adobe Analytics data needed them in a PostgreSQL database, and they wanted regular automatic updates. Another dev had already had a crack but bailed for unknown reasons. I was given his Jupyter Notebook and in theory just had to refactor the code into a better setup, and I only had a couple evenings and a morning to pull it off. That sounded simple enough, and after a seemingly successful morning, I planned to finish the project late that evening, until I discovered that things would not be so simple, and the volume of data was too large to wait around for. At this point, I had to hand it back to the boss, as I had no more time. It proved a time sink for him also, due to oddities with the design of the pipeline resulting in an infinite loop.

Features

  • Pivots data into new schemas.
  • Runs automatically on a server.
  • Errors introduce breaking changes when appropriate, so that they are not baked in.

Tools Involved

  • Python
  • PostgreSQL database
  • Jupyter Notebook
  • Singer.io
  • JSON schema
  • YAML and setup files
  • Git
  • Environnemental variables
  • WSL
  • Google Sheets API
  • A virtual environment

The Steps

  • Identifying key endpoints.
  • Scoping out the details of the API.
  • Testing the Jupyter Notebook
  • Refactoring the code.
  • Writing the schemas and loops.
  • Testing.
  • Code cleanup.

More projects