Self-Hosting Octopipe

Octopipe is designed to provide an outstanding local development experience. Self-hosting enables you to test and run pipelines on your own infrastructure, ensuring you have full control over the environment and can debug issues in real time.

Why Self-Host?

  • Local Development: Focus on rapid development and testing without the overhead of cloud deployment.
  • Real-Time Monitoring: Access detailed logs and status updates to troubleshoot and optimize pipeline performance.
  • Full Control: Customize your environment to suit specific development needs.

Setting Up Your Local Environment

Prerequisites

Ensure your system meets the following requirements:
  • Python 3.8+ installed.
  • Node.js and npm installed.
  • Docker and Docker Compose (recommended for managing multiple services).
  • Git for source control.

Step 1: Clone the Repository

Clone the Octopipe repository from GitHub:
git clone https://github.com/your-org/octopipe.git
cd octopipe
Step 2: Install Dependencies Install Python dependencies:
pip install -r requirements.txt
If Node.js dependencies are needed, run:
npm install
Step 3: Set Up Docker Compose For a self-hosted setup, Docker Compose can launch all required services (Meltano, Airflow, Kafka, Spark, etc.). Create or update the docker-compose.yml file with the required services:
version: '3.8'
services:
  octopipe:
    image: your-org/octopipe:latest
    ports:
      - "8000:8000"
    environment:
      - OCTOPIPE_ENV=local
  airflow:
    image: apache/airflow:2.2.2
    ports:
      - "8080:8080"
  kafka:
    image: confluentinc/cp-kafka:latest
  spark:
    image: bitnami/spark:latest
Tip: Customize the configuration as per your environment and resource availability. Step 4: Launch the Environment Start all services using Docker Compose:
docker-compose up
This command brings up all the required services in one command, making it easier to manage local development. Running and Testing Pipelines Locally Initialize a New Pipeline:
octopipe init --name local_pipeline --description "Local development pipeline" --local
Manage Components: Add data sources, destinations, and transformations as per your project requirements. Start and Monitor Pipelines:
octopipe start local_pipeline
octopipe logs local_pipeline --follow
Monitoring and Debugging Real-Time Logs: Use the logs command to stream output to your terminal, allowing for on-the-fly debugging. Status Checks: Regularly check pipeline status with:
octopipe status local_pipeline
Step-by-Step Debugging: In case of errors, stop the pipeline, inspect logs, adjust configurations, and restart:
octopipe stop local_pipeline
octopipe start local_pipeline
Tips for an Amazing Local Experience Use a Dedicated Environment: Run Octopipe in a separate virtual machine or container to avoid conflicts with other applications. Automate Routine Tasks: Use scripts to automate repetitive tasks such as starting/stopping services. Document Local Configurations: Keep notes on any local tweaks to facilitate quick troubleshooting and team onboarding. Conclusion Self-hosting Octopipe offers a powerful and flexible way to develop, test, and optimize your data pipelines locally. With detailed logs, easy management of services through Docker Compose, and robust CLI tools, you can enjoy a development experience that is both efficient and scalable. Embrace the freedom of local development, and fine-tune your pipelines before deploying them to production!