Skip to main content

Building an E-commerce Analytics Pipeline with CoginitiScript

Welcome to this comprehensive tutorial on building data transformation pipelines with CoginitiScript! We'll use a fictional e-commerce store called Varenyk Shop to demonstrate how to transform raw sales data into analytics-ready datasets.

What You'll Learn

By the end of this tutorial, you'll understand how to:

  • Structure CoginitiScript projects using packages and layers
  • Create reusable source data abstractions
  • Build analytical datasets with RFM (Recency, Frequency, Monetary) analysis
  • Use dynamic SQL generation for flexible reporting
  • Implement publication strategies for data materialization
  • Set up automated data pipelines

Prerequisites

  • Basic understanding of SQL
  • Access to a PostgreSQL database
  • Coginiti Team or Enterprise account
  • Familiarity with CoginitiScript basics (see CoginitiScript Reference)

About Varenyk Shop

Varenyk Shop is a fictional e-commerce store that sells varenyky - traditional Ukrainian dumplings. The business needs to analyze customer behavior, order patterns, and payment preferences to make data-driven decisions.

Raw Data Overview

Our source system contains the following tables with raw transactional data:

Source Tables:

  • src_customers - Customer information
  • src_products - Product catalog
  • src_orders - Order headers
  • src_order_lines - Order line items
  • src_order_statuses - Order status lookup
  • src_payment_types - Payment method lookup

Analytics Goals: We want to transform this raw data into two main analytics datasets:

  • Customers dataset - Customer analysis with RFM metrics
  • Orders dataset - Order analysis with payment breakdowns

Tutorial Setup

Step 1: Download and Import the Project

  1. Download the Coginiti catalog package: varenyk_shop.coginitipkg
  2. Import this package into your Coginiti catalog root folder
  3. Navigate to the imported project in your catalog

Step 2: Create Sample Data

  1. Open a new query tab in Coginiti
  2. Select your PostgreSQL database connection
  3. Run the following script to generate sample data:
#+import "tutorials/varenyk_shop/seed"

{{"{{ seed.Run() }}"}};

This creates all source tables and populates them with sample e-commerce data.

Step 3: Execute the Pipeline

  1. Open tutorials/varenyk_shop/schedule_nightly
  2. Execute it on the same PostgreSQL database
  3. Verify that customers and orders tables are created

Now let's dive into how this pipeline is structured and implemented!

Project Structure

The tutorial project follows a layered architecture pattern:

tutorials/
└── varenyk_shop/
├── reporting/ # Analytics layer
│ ├── customers # Customer analytics dataset
│ └── orders # Order analytics dataset
├── sources/ # Source data abstraction layer
│ └── sources # Source table definitions
├── seed/ # Sample data generation
│ ├── gen/
│ │ └── person # Person data generator
│ ├── clean # Data cleanup utilities
│ └── source_data # Source data creation
└── schedule_nightly # Pipeline orchestration

This structure demonstrates several CoginitiScript best practices:

  • Layered architecture for maintainability
  • Package organization for code reuse
  • Separation of concerns between data layers

Layer 1: Source Data Abstraction

Why Create a Source Layer?

The source layer provides several key benefits:

  • Abstraction: Downstream transformations reference blocks, not tables directly
  • Maintainability: Changes to source systems only affect the source layer
  • Cleanliness: Transformation logic stays focused on business rules
  • Consistency: Standardized column names and data types

Source Block Definitions

Let's examine the source blocks in tutorials/varenyk_shop/sources/sources:

#+src sql Customers()
#+begin
SELECT
id AS customer_id,
first_name,
last_name,
email
FROM
src_customers;
#+end

#+src sql Products()
#+begin
SELECT
id AS product_id,
name
FROM
src_products;
#+end

#+src sql Orders()
#+begin
SELECT
id AS order_id,
order_date,
customer_id,
payment_type_id,
order_status
FROM
src_orders;
#+end

#+src sql OrderLines()
#+begin
SELECT
id AS order_line_id,
order_id,
product_id,
quantity,
price
FROM
src_order_lines;
#+end

#+src sql OrderStatuses()
#+begin
SELECT
id AS order_status_id,
status_name
FROM
src_order_statuses;
#+end

#+src sql PaymentTypes()
#+begin
SELECT
id AS payment_type_id,
payment_type_name
FROM
src_payment_types;
#+end

Key Features:

  • Column renaming: Standardize ID column names across tables
  • Column selection: Only expose needed columns
  • Clean interface: Downstream consumers don't need to know about source table structure

Layer 2: Analytics Models

Customer Analytics with RFM Analysis

RFM analysis is a proven method for customer segmentation:

  • Recency: How recently did the customer make a purchase?
  • Frequency: How often does the customer purchase?
  • Monetary Value: How much does the customer spend?

Implementation Strategy

We split the customer analytics into two blocks for better maintainability:

File: tutorials/varenyk_shop/reporting/customers

#+import "tutorials/varenyk_shop/sources"

#+src sql rfmValues()
#+meta {
:doc "Calculates RFM metrics for each customer:
- first_order: Date of first purchase
- recent_order: Date of most recent purchase
- total_orders: Total number of orders (Frequency)
- total_sales: Total amount spent (Monetary Value)"
}
#+begin
SELECT
customer_id,
MIN(o.order_date) AS first_order,
MAX(o.order_date) AS recent_order,
COUNT(o.order_id) AS total_orders,
SUM(ol.price) AS total_sales
FROM
{{"{{ sources.Orders() }}"}} AS o
INNER JOIN {{"{{ sources.OrderLines() }}"}} AS ol
ON ol.order_id = o.order_id
GROUP BY
customer_id;
#+end

#+src sql Customers()
#+meta {
:publication {
:type "table",
:name "customers"
}
}
#+begin
SELECT
c.customer_id,
c.first_name,
c.last_name,
r.first_order,
r.recent_order,
r.total_orders,
r.total_sales AS ltv -- Lifetime Value
FROM
{{"{{ sources.Customers() }}"}} AS c
LEFT JOIN {{"{{ rfmValues() }}"}} AS r
ON r.customer_id = c.customer_id;
#+end

Design Highlights:

  • Block composition: rfmValues() calculates metrics, Customers() assembles final result
  • Source references: Uses sources.Orders() and sources.OrderLines() blocks
  • Publication strategy: Materializes result as customers table
  • Documentation: Clear metadata explaining business logic

Order Analytics with Dynamic Payment Columns

The orders dataset demonstrates CoginitiScript's dynamic SQL generation capabilities.

File: tutorials/varenyk_shop/reporting/orders

#+import "tutorials/varenyk_shop/sources"

#+const
paymentTypes = ["credit_card", "paypal", "wire_transfer"];
#+end

#+src sql Orders()
#+meta {
:publication {
:type "table",
:name "orders"
}
}
#+begin
SELECT
o.order_id,
o.customer_id,
o.order_date,
os.status_name AS order_status,
#+for t : paymentTypes do
SUM(CASE WHEN pt.payment_type_name = '{{"{{ t }}"}}' THEN ol.price ELSE 0 END) AS {{"{{ t }}"}}_amount,
#+end
SUM(ol.price) AS amount
FROM
{{"{{ sources.Orders() }}"}} AS o
INNER JOIN {{"{{ sources.OrderLines() }}"}} AS ol
ON ol.order_id = o.order_id
INNER JOIN {{"{{ sources.PaymentTypes() }}"}} AS pt
ON pt.payment_type_id = o.payment_type_id
INNER JOIN {{"{{ sources.OrderStatuses() }}"}} AS os
ON os.order_status_id = o.order_status
GROUP BY
o.order_id,
o.customer_id,
o.order_date,
os.status_name;
#+end

Generated SQL Preview: The CoginitiScript template expands to:

SELECT
o.order_id,
o.customer_id,
o.order_date,
os.status_name AS order_status,
SUM(CASE WHEN pt.payment_type_name = 'credit_card' THEN ol.price ELSE 0 END) AS credit_card_amount,
SUM(CASE WHEN pt.payment_type_name = 'paypal' THEN ol.price ELSE 0 END) AS paypal_amount,
SUM(CASE WHEN pt.payment_type_name = 'wire_transfer' THEN ol.price ELSE 0 END) AS wire_transfer_amount,
SUM(ol.price) AS amount
FROM ...

Benefits of Dynamic Generation:

  • Maintainability: Add new payment types by updating the constant
  • DRY Principle: No repetitive CASE expressions
  • Flexibility: Easy to modify payment type logic

Pipeline Orchestration

Automated Publication Execution

The pipeline orchestration is handled by tutorials/varenyk_shop/schedule_nightly:

#+import "std/publication"
#+import "tutorials/varenyk_shop/reporting"

{{"{{ publication.Run(packages = [reporting]) }}"}};

How it works:

  1. Import required packages: Standard library and reporting layer
  2. Execute publications: Runs all blocks with publication metadata in the reporting package
  3. Dependency resolution: CoginitiScript handles execution order automatically

Scheduling (Enterprise Feature)

In Coginiti Team/Enterprise, you can schedule this asset to run nightly:

  1. Right-click on schedule_nightly
  2. Select "Schedule"
  3. Configure timing and database connection
  4. Enable the schedule

Exploring the Results

Query the Customer Dataset

SELECT 
customer_id,
first_name,
last_name,
first_order,
recent_order,
total_orders,
ltv
FROM customers
ORDER BY ltv DESC
LIMIT 10;

Sample Results:

customer_idfirst_namelast_namefirst_orderrecent_ordertotal_ordersltv
15MariaPetrenko2024-01-152024-03-208245.50
7IvanKovalenko2024-01-202024-03-186198.75

Query the Orders Dataset

SELECT 
order_id,
customer_id,
order_date,
order_status,
credit_card_amount,
paypal_amount,
wire_transfer_amount,
amount
FROM orders
WHERE order_date >= '2024-03-01'
ORDER BY order_date DESC;

Advanced Patterns Demonstrated

1. Package Organization

-- Clean imports at the top of each file
#+import "tutorials/varenyk_shop/sources"
#+import "std/publication"

2. Block Composition

-- Breaking complex logic into manageable pieces
SELECT * FROM {{"{{ rfmValues() }}"}} -- Intermediate calculation
-- Used in final result assembly

3. Dynamic SQL Generation

-- Using constants and loops for maintainable code
#+const paymentTypes = ["credit_card", "paypal", "wire_transfer"];
#+for t : paymentTypes do
-- Generate columns dynamically
#+end

4. Publication Strategies

-- Materializing results for performance
#+meta {
:publication {
:type "table",
:name "customers"
}
}

Best Practices Demonstrated

1. Layered Architecture

  • Source layer: Abstract raw data
  • Reporting layer: Business logic and analytics
  • Orchestration: Pipeline execution

2. Documentation

#+meta {
:doc "Clear description of what this block does,
including parameter explanations and business context."
}

3. Naming Conventions

  • Block names: PascalCase (Customers(), OrderLines())
  • Column names: snake_case (customer_id, total_orders)
  • Package names: lowercase (sources, reporting)

4. Error Prevention

  • LEFT JOIN: Handle customers without orders gracefully
  • Standardized IDs: Consistent naming across all sources

Extending the Tutorial

Add New Metrics

Want to add customer acquisition cost? Add to rfmValues():

#+src sql rfmValues()
#+begin
SELECT
customer_id,
-- Existing metrics...
COUNT(DISTINCT DATE_TRUNC('month', o.order_date)) AS active_months,
AVG(ol.price) AS avg_order_value
FROM ...
#+end

Add New Payment Types

Simply update the constant:

#+const
paymentTypes = ["credit_card", "paypal", "wire_transfer", "crypto", "bank_transfer"];
#+end

The SQL will regenerate automatically!

Create Product Analytics

Add a new reporting block:

#+src sql ProductAnalysis()
#+meta {
:publication {
:type "table",
:name "product_metrics"
}
}
#+begin
SELECT
p.product_id,
p.name,
COUNT(ol.order_line_id) AS total_orders,
SUM(ol.quantity) AS total_quantity_sold,
SUM(ol.price) AS total_revenue
FROM
{{"{{ sources.Products() }}"}} AS p
LEFT JOIN {{"{{ sources.OrderLines() }}"}} AS ol
ON ol.product_id = p.product_id
GROUP BY
p.product_id, p.name;
#+end

Troubleshooting

Common Issues

Issue: "Block not found" error

Solution: Check import statements and block names
- Ensure packages are imported correctly
- Verify block names match exactly (case-sensitive)

Issue: Publication fails

Solution: Check database permissions and table names
- Ensure database user can CREATE TABLE
- Verify table names don't conflict with existing objects

Issue: Dynamic SQL doesn't generate correctly

Solution: Test template expansion
- Use simple test queries to verify loop output
- Check constant definitions and data types

Next Steps

Learn More CoginitiScript Features

Explore Advanced Patterns

  • Incremental processing: Update only changed data
  • Data quality testing: Implement validation rules
  • Multi-environment deployment: Promote code through dev/test/prod

Build Your Own Pipeline

Apply these patterns to your data:

  1. Identify your source systems
  2. Design your analytics requirements
  3. Create a source abstraction layer
  4. Build transformation logic
  5. Set up publication and scheduling

Summary

This tutorial demonstrated how to build a complete analytics pipeline using CoginitiScript:

Structured project organization with packages and layers ✅ Source data abstraction for maintainability ✅ Business logic implementation with RFM analysis ✅ Dynamic SQL generation for flexible reporting ✅ Publication strategies for data materialization ✅ Pipeline orchestration for automation

CoginitiScript's features—blocks, packages, templates, and publication—provide a powerful foundation for building scalable, maintainable data transformation pipelines. The patterns shown here can be adapted to virtually any analytics use case.

Start with this foundation and gradually add complexity as your requirements grow. Happy coding with CoginitiScript!