Building an E-commerce Analytics Pipeline with CoginitiScript

Welcome to this comprehensive tutorial on building data transformation pipelines with CoginitiScript! We'll use a fictional e-commerce store called Varenyk Shop to demonstrate how to transform raw sales data into analytics-ready datasets.

What You'll Learn

By the end of this tutorial, you'll understand how to:

Structure CoginitiScript projects using packages and layers
Create reusable source data abstractions
Build analytical datasets with RFM (Recency, Frequency, Monetary) analysis
Use dynamic SQL generation for flexible reporting
Implement publication strategies for data materialization
Set up automated data pipelines

Prerequisites

Basic understanding of SQL
Access to a PostgreSQL database
Coginiti Team or Enterprise account
Familiarity with CoginitiScript basics (see CoginitiScript Reference)

About Varenyk Shop

Varenyk Shop is a fictional e-commerce store that sells varenyky - traditional Ukrainian dumplings. The business needs to analyze customer behavior, order patterns, and payment preferences to make data-driven decisions.

Raw Data Overview

Our source system contains the following tables with raw transactional data:

Source Tables:

src_customers - Customer information
src_products - Product catalog
src_orders - Order headers
src_order_lines - Order line items
src_order_statuses - Order status lookup
src_payment_types - Payment method lookup

Analytics Goals: We want to transform this raw data into two main analytics datasets:

Customers dataset - Customer analysis with RFM metrics
Orders dataset - Order analysis with payment breakdowns

Tutorial Setup

Step 1: Download and Import the Project

Download the Coginiti catalog package: varenyk_shop.coginitipkg
Import this package into your Coginiti catalog root folder
Navigate to the imported project in your catalog

Step 2: Create Sample Data

Open a new query tab in Coginiti
Select your PostgreSQL database connection
Run the following script to generate sample data:

#+import "tutorials/varenyk_shop/seed"

{{ seed.Run() }};

This creates all source tables and populates them with sample e-commerce data.

Step 3: Execute the Pipeline

Open tutorials/varenyk_shop/schedule_nightly
Execute it on the same PostgreSQL database
Verify that customers and orders tables are created

Now let's dive into how this pipeline is structured and implemented!

Project Structure

The tutorial project follows a layered architecture pattern:

tutorials/
└── varenyk_shop/
    ├── reporting/           # Analytics layer
    │   ├── customers       # Customer analytics dataset
    │   └── orders          # Order analytics dataset
    ├── sources/            # Source data abstraction layer
    │   └── sources         # Source table definitions
    ├── seed/               # Sample data generation
    │   ├── gen/
    │   │   └── person      # Person data generator
    │   ├── clean           # Data cleanup utilities
    │   └── source_data     # Source data creation
    └── schedule_nightly    # Pipeline orchestration

This structure demonstrates several CoginitiScript best practices:

Layered architecture for maintainability
Package organization for code reuse
Separation of concerns between data layers

Layer 1: Source Data Abstraction

Why Create a Source Layer?

The source layer provides several key benefits:

Abstraction: Downstream transformations reference blocks, not tables directly
Maintainability: Changes to source systems only affect the source layer
Cleanliness: Transformation logic stays focused on business rules
Consistency: Standardized column names and data types

Source Block Definitions

Let's examine the source blocks in tutorials/varenyk_shop/sources/sources:

#+src sql Customers()
#+begin
  SELECT
    id AS customer_id,
    first_name,
    last_name,
    email
  FROM
    src_customers;
#+end

#+src sql Products()
#+begin
  SELECT
    id AS product_id,
    name
  FROM
    src_products;
#+end

#+src sql Orders()
#+begin
  SELECT
    id AS order_id,
    order_date,
    customer_id,
    payment_type_id,
    order_status
  FROM
    src_orders;
#+end

#+src sql OrderLines()
#+begin
  SELECT
    id AS order_line_id,
    order_id,
    product_id,
    quantity,
    price
  FROM
    src_order_lines;
#+end

#+src sql OrderStatuses()
#+begin
  SELECT
    id AS order_status_id,
    status_name
  FROM
    src_order_statuses;
#+end

#+src sql PaymentTypes()
#+begin
  SELECT
    id AS payment_type_id,
    payment_type_name
  FROM
    src_payment_types;
#+end

Key Features:

Column renaming: Standardize ID column names across tables
Column selection: Only expose needed columns
Clean interface: Downstream consumers don't need to know about source table structure

Layer 2: Analytics Models

Customer Analytics with RFM Analysis

RFM analysis is a proven method for customer segmentation:

Recency: How recently did the customer make a purchase?
Frequency: How often does the customer purchase?
Monetary Value: How much does the customer spend?

Implementation Strategy

We split the customer analytics into two blocks for better maintainability:

File: tutorials/varenyk_shop/reporting/customers

#+import "tutorials/varenyk_shop/sources"

#+src sql rfmValues()
#+meta {
  :doc "Calculates RFM metrics for each customer:
  - first_order: Date of first purchase
  - recent_order: Date of most recent purchase  
  - total_orders: Total number of orders (Frequency)
  - total_sales: Total amount spent (Monetary Value)"
}
#+begin
  SELECT
    customer_id,
    MIN(o.order_date) AS first_order,
    MAX(o.order_date) AS recent_order,
    COUNT(o.order_id) AS total_orders,
    SUM(ol.price) AS total_sales
  FROM
    {{ sources.Orders() }} AS o
  INNER JOIN {{ sources.OrderLines() }} AS ol
    ON ol.order_id = o.order_id
  GROUP BY
    customer_id;
#+end

#+src sql Customers()
#+meta {
  :publication {
    :type "table",
    :name "customers"
  }
}
#+begin
  SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    r.first_order,
    r.recent_order,
    r.total_orders,
    r.total_sales AS ltv  -- Lifetime Value
  FROM
    {{ sources.Customers() }} AS c
  LEFT JOIN {{ rfmValues() }} AS r
    ON r.customer_id = c.customer_id;
#+end

Design Highlights:

Block composition: rfmValues() calculates metrics, Customers() assembles final result
Source references: Uses sources.Orders() and sources.OrderLines() blocks
Publication strategy: Materializes result as customers table
Documentation: Clear metadata explaining business logic

Order Analytics with Dynamic Payment Columns

The orders dataset demonstrates CoginitiScript's dynamic SQL generation capabilities.

File: tutorials/varenyk_shop/reporting/orders

#+import "tutorials/varenyk_shop/sources"

#+const
  paymentTypes = ["credit_card", "paypal", "wire_transfer"];
#+end

#+src sql Orders()
#+meta {
  :publication {
    :type "table",
    :name "orders"
  }
}
#+begin
  SELECT
    o.order_id,
    o.customer_id,
    o.order_date,
    os.status_name AS order_status,
    #+for t : paymentTypes do
      SUM(CASE WHEN pt.payment_type_name = '{{ t }}' THEN ol.price ELSE 0 END) AS {{ t }}_amount,
    #+end
    SUM(ol.price) AS amount
  FROM
    {{ sources.Orders() }} AS o
  INNER JOIN {{ sources.OrderLines() }} AS ol
    ON ol.order_id = o.order_id
  INNER JOIN {{ sources.PaymentTypes() }} AS pt
    ON pt.payment_type_id = o.payment_type_id
  INNER JOIN {{ sources.OrderStatuses() }} AS os
    ON os.order_status_id = o.order_status
  GROUP BY
    o.order_id,
    o.customer_id,
    o.order_date,
    os.status_name;
#+end

Generated SQL Preview: The CoginitiScript template expands to:

SELECT
  o.order_id,
  o.customer_id,
  o.order_date,
  os.status_name AS order_status,
  SUM(CASE WHEN pt.payment_type_name = 'credit_card' THEN ol.price ELSE 0 END) AS credit_card_amount,
  SUM(CASE WHEN pt.payment_type_name = 'paypal' THEN ol.price ELSE 0 END) AS paypal_amount,
  SUM(CASE WHEN pt.payment_type_name = 'wire_transfer' THEN ol.price ELSE 0 END) AS wire_transfer_amount,
  SUM(ol.price) AS amount
FROM ...

Benefits of Dynamic Generation:

Maintainability: Add new payment types by updating the constant
DRY Principle: No repetitive CASE expressions
Flexibility: Easy to modify payment type logic

Pipeline Orchestration

Automated Publication Execution

The pipeline orchestration is handled by tutorials/varenyk_shop/schedule_nightly:

#+import "std/publication"
#+import "tutorials/varenyk_shop/reporting"

{{ publication.Run(packages = [reporting]) }};

How it works:

Import required packages: Standard library and reporting layer
Execute publications: Runs all blocks with publication metadata in the reporting package
Dependency resolution: CoginitiScript handles execution order automatically

Scheduling (Enterprise Feature)

In Coginiti Team/Enterprise, you can schedule this asset to run nightly:

Right-click on schedule_nightly
Select "Schedule"
Configure timing and database connection
Enable the schedule

Exploring the Results

Query the Customer Dataset

SELECT 
  customer_id,
  first_name,
  last_name,
  first_order,
  recent_order,
  total_orders,
  ltv
FROM customers
ORDER BY ltv DESC
LIMIT 10;

Sample Results:

customer_id	first_name	last_name	first_order	recent_order	total_orders	ltv
15	Maria	Petrenko	2024-01-15	2024-03-20	8	245.50
7	Ivan	Kovalenko	2024-01-20	2024-03-18	6	198.75

Query the Orders Dataset

SELECT 
  order_id,
  customer_id,
  order_date,
  order_status,
  credit_card_amount,
  paypal_amount,
  wire_transfer_amount,
  amount
FROM orders
WHERE order_date >= '2024-03-01'
ORDER BY order_date DESC;

Advanced Patterns Demonstrated

1. Package Organization

-- Clean imports at the top of each file
#+import "tutorials/varenyk_shop/sources"
#+import "std/publication"

2. Block Composition

-- Breaking complex logic into manageable pieces
SELECT * FROM {{ rfmValues() }} -- Intermediate calculation
-- Used in final result assembly

3. Dynamic SQL Generation

-- Using constants and loops for maintainable code
#+const paymentTypes = ["credit_card", "paypal", "wire_transfer"];
#+for t : paymentTypes do
  -- Generate columns dynamically
#+end

4. Publication Strategies

-- Materializing results for performance
#+meta {
  :publication {
    :type "table",
    :name "customers"
  }
}

Best Practices Demonstrated

1. Layered Architecture

Source layer: Abstract raw data
Reporting layer: Business logic and analytics
Orchestration: Pipeline execution

2. Documentation

#+meta {
  :doc "Clear description of what this block does,
  including parameter explanations and business context."
}

3. Naming Conventions

Block names: PascalCase (Customers(), OrderLines())
Column names: snake_case (customer_id, total_orders)
Package names: lowercase (sources, reporting)

4. Error Prevention

LEFT JOIN: Handle customers without orders gracefully
Standardized IDs: Consistent naming across all sources

Extending the Tutorial

Add New Metrics

Want to add customer acquisition cost? Add to rfmValues():

#+src sql rfmValues()
#+begin
  SELECT
    customer_id,
    -- Existing metrics...
    COUNT(DISTINCT DATE_TRUNC('month', o.order_date)) AS active_months,
    AVG(ol.price) AS avg_order_value
  FROM ...
#+end

Add New Payment Types

Simply update the constant:

#+const
  paymentTypes = ["credit_card", "paypal", "wire_transfer", "crypto", "bank_transfer"];
#+end

The SQL will regenerate automatically!

Create Product Analytics

Add a new reporting block:

#+src sql ProductAnalysis()
#+meta {
  :publication {
    :type "table", 
    :name "product_metrics"
  }
}
#+begin
  SELECT
    p.product_id,
    p.name,
    COUNT(ol.order_line_id) AS total_orders,
    SUM(ol.quantity) AS total_quantity_sold,
    SUM(ol.price) AS total_revenue
  FROM
    {{ sources.Products() }} AS p
  LEFT JOIN {{ sources.OrderLines() }} AS ol
    ON ol.product_id = p.product_id
  GROUP BY
    p.product_id, p.name;
#+end

Troubleshooting

Common Issues

Issue: "Block not found" error

Solution: Check import statements and block names
- Ensure packages are imported correctly
- Verify block names match exactly (case-sensitive)

Issue: Publication fails

Solution: Check database permissions and table names
- Ensure database user can CREATE TABLE
- Verify table names don't conflict with existing objects

Issue: Dynamic SQL doesn't generate correctly

Solution: Test template expansion
- Use simple test queries to verify loop output
- Check constant definitions and data types

Next Steps

Learn More CoginitiScript Features

Explore Advanced Patterns

Incremental processing: Update only changed data
Data quality testing: Implement validation rules
Multi-environment deployment: Promote code through dev/test/prod

Build Your Own Pipeline

Apply these patterns to your data:

Identify your source systems
Design your analytics requirements
Create a source abstraction layer
Build transformation logic
Set up publication and scheduling

Summary

This tutorial demonstrated how to build a complete analytics pipeline using CoginitiScript:

✅ Structured project organization with packages and layers ✅ Source data abstraction for maintainability ✅ Business logic implementation with RFM analysis ✅ Dynamic SQL generation for flexible reporting ✅ Publication strategies for data materialization ✅ Pipeline orchestration for automation

CoginitiScript's features—blocks, packages, templates, and publication—provide a powerful foundation for building scalable, maintainable data transformation pipelines. The patterns shown here can be adapted to virtually any analytics use case.

Start with this foundation and gradually add complexity as your requirements grow. Happy coding with CoginitiScript!

What You'll Learn​

Prerequisites​

About Varenyk Shop​

Raw Data Overview​

Tutorial Setup​

Step 1: Download and Import the Project​

Step 2: Create Sample Data​

Step 3: Execute the Pipeline​

Project Structure​

Layer 1: Source Data Abstraction​

Why Create a Source Layer?​

Source Block Definitions​

Layer 2: Analytics Models​

Customer Analytics with RFM Analysis​

Implementation Strategy​

Order Analytics with Dynamic Payment Columns​

Pipeline Orchestration​

Automated Publication Execution​

Scheduling (Enterprise Feature)​

Exploring the Results​

Query the Customer Dataset​

Query the Orders Dataset​

Advanced Patterns Demonstrated​

1. Package Organization​

2. Block Composition​

3. Dynamic SQL Generation​

4. Publication Strategies​

Best Practices Demonstrated​

1. Layered Architecture​

2. Documentation​

3. Naming Conventions​

4. Error Prevention​

Extending the Tutorial​

Add New Metrics​

Add New Payment Types​

Create Product Analytics​

Troubleshooting​

Common Issues​

Next Steps​

Learn More CoginitiScript Features​

Explore Advanced Patterns​

Build Your Own Pipeline​

Summary​