Skip to main content

Great Expectations to CoginitiScript Conversion Reference

This reference guide provides direct mappings between Python-based Great Expectations tests and SQL-based CoginitiScript equivalents. Use this as a lookup table when converting existing Great Expectations suites or when familiar with Great Expectations patterns.

Conversion Model

CoginitiScript uses an assertion-based model where:

  • Pass: Query returns zero rows
  • Fail: Query returns rows showing which records failed

This contrasts with Great Expectations' boolean model but provides more diagnostic information.

Why Convert to SQL-Based Tests

SQL-based data quality tests offer several advantages:

  • Performance: Run directly on the database without data transfer overhead
  • Simplicity: No Python environment or dependencies required
  • Integration: Fit naturally into SQL-based data pipelines
  • Transparency: Show exactly which records fail, not just that failures exist
  • Portability: Work across different database platforms with minor syntax adjustments

Distribution Tests

Distribution tests validate statistical properties and ranges to detect anomalies and ensure consistency.

Values Between Min/Max

Great Expectations: ExpectColumnValuesToBeBetween

CoginitiScript Equivalent:

#+test sql ValuesBetweenRange()
#+begin
SELECT *
FROM orders
WHERE order_amount NOT BETWEEN 0 AND 10000;
#+end

Z-Score Outlier Detection

Great Expectations: ExpectColumnValueZScoresToBeLessThan

CoginitiScript Equivalent:

#+test sql ZScoreOutliers()
#+begin
WITH stats AS (
SELECT
AVG(price) as mean,
STDDEV(price) as std
FROM products
)
SELECT
p.product_id,
p.price,
ABS(p.price - s.mean) / NULLIF(s.std, 0) as z_score
FROM products p
CROSS JOIN stats s
WHERE ABS(p.price - s.mean) / NULLIF(s.std, 0) > 3;
#+end

Mean Within Expected Range

Great Expectations: Summary statistic expectations (mean)

CoginitiScript Equivalent:

#+test sql MeanInRange()
#+begin
SELECT
'Mean out of range' as issue,
avg_response
FROM (
SELECT AVG(response_time_ms) as avg_response
FROM api_calls
WHERE date = CURRENT_DATE
)
WHERE avg_response NOT BETWEEN 100 AND 500;
#+end

Percentile Validation

Great Expectations: Quantile expectations

CoginitiScript Equivalent:

#+test sql PercentileThreshold()
#+begin
SELECT
'P95 too high' as issue,
p95
FROM (
SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency) as p95
FROM requests
)
WHERE p95 > 1000;
#+end

Median Within Range

Great Expectations: Summary statistic expectations (median)

CoginitiScript Equivalent:

#+test sql MedianInRange()
#+begin
SELECT
'Median out of range' as issue,
median_val
FROM (
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) as median_val
FROM products
)
WHERE median_val NOT BETWEEN 50 AND 500;
#+end

Standard Deviation Check

Great Expectations: Summary statistic expectations (standard deviation)

CoginitiScript Equivalent:

#+test sql StandardDeviationInRange()
#+begin
SELECT
'Standard deviation out of range' as issue,
STDDEV(transaction_amount) as std_dev
FROM transactions
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
HAVING STDDEV(transaction_amount) NOT BETWEEN 10 AND 1000;
#+end

Uniqueness Tests

Uniqueness tests validate distinct values and ensure keys remain unique.

Single Column Uniqueness

Great Expectations: ExpectColumnValuesToBeUnique

CoginitiScript Equivalent:

#+test sql UniqueValues()
#+begin
SELECT
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
#+end

Composite Key Uniqueness

Great Expectations: ExpectCompoundColumnsToBeUnique

CoginitiScript Equivalent:

#+test sql CompositeKeyUnique()
#+begin
SELECT
customer_id,
order_date,
COUNT(*) as duplicate_count
FROM orders
GROUP BY customer_id, order_date
HAVING COUNT(*) > 1;
#+end

Uniqueness Ratio in Range

Great Expectations: ExpectColumnProportionOfUniqueValuesToBeBetween

CoginitiScript Equivalent:

#+test sql UniquenessRatio()
#+begin
SELECT
'Uniqueness ratio out of range' as issue,
CAST(COUNT(DISTINCT email) AS FLOAT) / NULLIF(COUNT(*), 0) as uniqueness_ratio
FROM customers
HAVING CAST(COUNT(DISTINCT email) AS FLOAT) / NULLIF(COUNT(*), 0) NOT BETWEEN 0.8 AND 1.0;
#+end

Distinct Value Count in Range

Great Expectations: ExpectColumnUniqueValueCountToBeBetween

CoginitiScript Equivalent:

#+test sql DistinctCountInRange()
#+begin
SELECT
'Distinct count out of range' as issue,
COUNT(DISTINCT state) as distinct_states
FROM addresses
HAVING COUNT(DISTINCT state) NOT BETWEEN 5 AND 50;
#+end

Values Unique Within Row

Great Expectations: ExpectSelectColumnValuesToBeUniqueWithinRecord

CoginitiScript Equivalent:

#+test sql ValuesUniqueWithinRow()
#+begin
SELECT
contact_id,
phone1,
phone2,
phone3
FROM contacts
WHERE phone1 = phone2
OR phone1 = phone3
OR phone2 = phone3;
#+end

Detailed Duplicate Analysis

CoginitiScript Pattern:

#+test sql DetailedDuplicateAnalysis()
#+begin
SELECT
username,
COUNT(*) as occurrence_count,
STRING_AGG(user_id::TEXT, ', ') as conflicting_ids
FROM users
GROUP BY username
HAVING COUNT(*) > 1
ORDER BY occurrence_count DESC;
#+end

Schema Validation Tests

Schema tests validate table structure, column presence, data types, and schema conformance.

Column Exists

Great Expectations: ExpectColumnToExist

CoginitiScript Equivalent:

#+test sql ColumnExists()
#+begin
SELECT 'Column email does not exist' as issue
WHERE NOT EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_name = 'users'
AND column_name = 'email'
AND table_schema = 'public'
);
#+end

Exact Column Count

Great Expectations: ExpectTableColumnCountToEqual

CoginitiScript Equivalent:

#+test sql ExactColumnCount()
#+begin
SELECT
'Column count mismatch' as issue,
COUNT(*) as actual_count,
15 as expected_count
FROM information_schema.columns
WHERE table_name = 'orders'
AND table_schema = 'public'
HAVING COUNT(*) != 15;
#+end

Column Count in Range

Great Expectations: ExpectTableColumnCountToBeBetween

CoginitiScript Equivalent:

#+test sql ColumnCountInRange()
#+begin
SELECT
'Column count out of range' as issue,
COUNT(*) as actual_count
FROM information_schema.columns
WHERE table_name = 'transactions'
AND table_schema = 'public'
HAVING COUNT(*) NOT BETWEEN 10 AND 20;
#+end

Columns Match Expected Set

Great Expectations: ExpectTableColumnsToMatchSet

CoginitiScript Equivalent:

#+test sql ColumnsMatchSet()
#+begin
WITH expected_columns AS (
SELECT unnest(ARRAY['id', 'name', 'email', 'created_at']) as col_name
),
actual_columns AS (
SELECT column_name as col_name
FROM information_schema.columns
WHERE table_name = 'users' AND table_schema = 'public'
),
missing AS (
SELECT col_name FROM expected_columns
EXCEPT
SELECT col_name FROM actual_columns
),
extra AS (
SELECT col_name FROM actual_columns
EXCEPT
SELECT col_name FROM expected_columns
)
SELECT
'Schema mismatch' as issue,
COALESCE((SELECT STRING_AGG(col_name, ', ') FROM missing), '') as missing_columns,
COALESCE((SELECT STRING_AGG(col_name, ', ') FROM extra), '') as extra_columns
FROM (SELECT 1) dummy
WHERE EXISTS (SELECT 1 FROM missing)
OR EXISTS (SELECT 1 FROM extra);
#+end

Columns in Expected Order

Great Expectations: ExpectTableColumnsToMatchOrderedList

CoginitiScript Equivalent:

#+test sql ColumnsInOrder()
#+begin
WITH expected AS (
SELECT unnest(ARRAY['id', 'customer_id', 'order_date', 'amount']) as col_name,
generate_series(1, 4) as position
),
actual AS (
SELECT column_name as col_name, ordinal_position as position
FROM information_schema.columns
WHERE table_name = 'orders' AND table_schema = 'public'
)
SELECT
e.position,
e.col_name as expected_column,
a.col_name as actual_column
FROM expected e
FULL OUTER JOIN actual a ON e.position = a.position
WHERE e.col_name IS DISTINCT FROM a.col_name;
#+end

Column Type Validation

Great Expectations: ExpectColumnValuesToBeOfType

CoginitiScript Equivalent:

#+test sql ColumnType()
#+begin
SELECT
'Type mismatch for column email' as issue,
data_type as actual_type,
'character varying' as expected_type
FROM information_schema.columns
WHERE table_name = 'users'
AND table_schema = 'public'
AND column_name = 'email'
AND data_type != 'character varying';
#+end

Column Type in Allowed List

Great Expectations: ExpectColumnValuesToBeInTypeList

CoginitiScript Equivalent:

#+test sql ColumnTypeInList()
#+begin
SELECT
'Invalid type for amount column' as issue,
data_type as actual_type
FROM information_schema.columns
WHERE table_name = 'transactions'
AND table_schema = 'public'
AND column_name = 'amount'
AND data_type NOT IN ('numeric', 'decimal', 'double precision', 'money');
#+end

Multi-Column Type Validation

CoginitiScript Pattern:

#+test sql MultiColumnTypeValidation()
#+begin
WITH expected_types AS (
SELECT 'id' as col, 'integer' as expected_type
UNION ALL SELECT 'email', 'character varying'
UNION ALL SELECT 'created_at', 'timestamp without time zone'
UNION ALL SELECT 'balance', 'numeric'
)
SELECT
e.col as column_name,
e.expected_type,
c.data_type as actual_type
FROM expected_types e
LEFT JOIN information_schema.columns c
ON c.column_name = e.col
AND c.table_name = 'accounts'
AND c.table_schema = 'public'
WHERE c.data_type IS NULL
OR c.data_type != e.expected_type;
#+end

Data Integrity Tests

Data integrity tests validate relationships between columns and ensure consistency.

Column Pair Values Equal

Great Expectations: ExpectColumnPairValuesToBeEqual

CoginitiScript Equivalent:

#+test sql ColumnPairEqual()
#+begin
SELECT
transaction_id,
sender_reference,
recipient_confirmation
FROM transactions
WHERE sender_reference != recipient_confirmation;
#+end

Multi-Column Sum Equals Expected

Great Expectations: ExpectMulticolumnSumToEqual

CoginitiScript Equivalent:

#+test sql MultiColumnSum()
#+begin
SELECT
account_id,
debits,
credits,
(debits + credits) as total
FROM account_balances
WHERE (debits + credits) != 0;
#+end

Column A Greater Than Column B

Great Expectations: ExpectColumnPairValuesAToBeGreaterThanB

CoginitiScript Equivalent:

#+test sql ColumnAGreaterThanB()
#+begin
SELECT
transfer_id,
sent_timestamp,
received_timestamp
FROM transfers
WHERE sent_timestamp >= received_timestamp;
#+end

Cross-Column Percentage Validation

CoginitiScript Pattern:

#+test sql CrossColumnPercentage()
#+begin
SELECT
order_id,
subtotal,
tax,
total,
(subtotal + tax) as calculated_total
FROM orders
WHERE ABS(total - (subtotal + tax)) > 0.01;
#+end

Referential Integrity Check

CoginitiScript Pattern:

#+test sql ReferentialIntegrity()
#+begin
SELECT
o.order_id,
o.customer_id
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
#+end

Cross-Table Balance Reconciliation

CoginitiScript Pattern:

#+test sql CrossTableBalance()
#+begin
SELECT
'Balance mismatch' as issue,
source_total,
target_total,
(source_total - target_total) as difference
FROM (
SELECT SUM(amount) as source_total
FROM source_transactions
WHERE date = CURRENT_DATE
) s
CROSS JOIN (
SELECT SUM(amount) as target_total
FROM target_transactions
WHERE date = CURRENT_DATE
) t
WHERE ABS(source_total - target_total) > 0.01;
#+end

Volume Tests

Volume tests validate row counts and detect volume anomalies.

Row Count Between Range

Great Expectations: ExpectTableRowCountToBeBetween

CoginitiScript Equivalent:

#+test sql RowCountInRange()
#+begin
SELECT
'Row count out of range' as issue,
COUNT(*) as actual_count,
1000 as min_expected,
5000 as max_expected
FROM daily_transactions
WHERE transaction_date = CURRENT_DATE
HAVING COUNT(*) NOT BETWEEN 1000 AND 5000;
#+end

Exact Row Count

Great Expectations: ExpectTableRowCountToEqual

CoginitiScript Equivalent:

#+test sql ExactRowCount()
#+begin
SELECT
'Row count mismatch' as issue,
COUNT(*) as actual_count,
100 as expected_count
FROM batch_import
WHERE batch_id = 'BATCH_2024_001'
HAVING COUNT(*) != 100;
#+end

Row Count Matches Other Table

Great Expectations: ExpectTableRowCountToEqualOtherTable

CoginitiScript Equivalent:

#+test sql RowCountMatchesTable()
#+begin
SELECT
'Row count mismatch between tables' as issue,
source_count,
target_count,
(source_count - target_count) as difference
FROM (
SELECT COUNT(*) as source_count
FROM source_table
WHERE processed_date = CURRENT_DATE
) s
CROSS JOIN (
SELECT COUNT(*) as target_count
FROM target_table
WHERE processed_date = CURRENT_DATE
) t
WHERE source_count != target_count;
#+end

Row Count Growth Validation

CoginitiScript Pattern:

#+test sql VolumeGrowth()
#+begin
WITH daily_counts AS (
SELECT COUNT(*) as today_count
FROM user_events
WHERE event_date = CURRENT_DATE
),
yesterday_counts AS (
SELECT COUNT(*) as yesterday_count
FROM user_events
WHERE event_date = CURRENT_DATE - INTERVAL '1 day'
)
SELECT
'Unexpected volume change' as issue,
today_count,
yesterday_count,
ROUND(100.0 * (today_count - yesterday_count) / NULLIF(yesterday_count, 0), 2) as pct_change
FROM daily_counts
CROSS JOIN yesterday_counts
WHERE today_count < yesterday_count * 0.8
OR today_count > yesterday_count * 2.0;
#+end

Partition Volume Check

CoginitiScript Pattern:

#+test sql PartitionVolume()
#+begin
SELECT
product_category,
COUNT(*) as row_count
FROM sales
WHERE sale_date = CURRENT_DATE
GROUP BY product_category
HAVING COUNT(*) < 10;
#+end

Missing Values and Completeness Tests

Completeness tests validate required data presence and null value patterns.

Column Values Not Null

Great Expectations: ExpectColumnValuesToNotBeNull

CoginitiScript Equivalent:

#+test sql NoNullValues()
#+begin
SELECT
user_id,
email
FROM users
WHERE email IS NULL;
#+end

Column Values Are Null

Great Expectations: ExpectColumnValuesToBeNull

CoginitiScript Equivalent:

#+test sql ValuesAreNull()
#+begin
SELECT
order_id,
cancellation_reason
FROM orders
WHERE status = 'completed'
AND cancellation_reason IS NOT NULL;
#+end

Non-Null Proportion in Range

Great Expectations: ExpectColumnProportionOfNonNullValuesToBeBetween

CoginitiScript Equivalent:

#+test sql CompletenessRatio()
#+begin
SELECT
'Completeness out of range' as issue,
CAST(COUNT(phone_number) AS FLOAT) / NULLIF(COUNT(*), 0) as completeness_ratio
FROM customers
HAVING CAST(COUNT(phone_number) AS FLOAT) / NULLIF(COUNT(*), 0) NOT BETWEEN 0.7 AND 1.0;
#+end

Null Count Threshold

CoginitiScript Pattern:

#+test sql NullCountThreshold()
#+begin
SELECT
'Too many null values' as issue,
COUNT(*) as total_rows,
COUNT(*) - COUNT(middle_name) as null_count,
ROUND(100.0 * (COUNT(*) - COUNT(middle_name)) / NULLIF(COUNT(*), 0), 2) as null_pct
FROM employees
HAVING (COUNT(*) - COUNT(middle_name)) > 100;
#+end

Multi-Column Completeness

CoginitiScript Pattern:

#+test sql MultiColumnCompleteness()
#+begin
SELECT
customer_id,
email,
phone,
mailing_address
FROM customers
WHERE email IS NULL
AND phone IS NULL
AND mailing_address IS NULL;
#+end

Conditional Completeness

CoginitiScript Pattern:

#+test sql ConditionalCompleteness()
#+begin
SELECT
order_id,
shipping_address,
order_type
FROM orders
WHERE order_type = 'physical'
AND shipping_address IS NULL;
#+end

Completeness Trend Check

CoginitiScript Pattern:

#+test sql CompletenessTrend()
#+begin
WITH today_completeness AS (
SELECT
CAST(COUNT(optional_field) AS FLOAT) / NULLIF(COUNT(*), 0) as completeness
FROM records
WHERE created_date = CURRENT_DATE
)
SELECT
'Completeness degradation' as issue,
completeness as today_completeness
FROM today_completeness
WHERE completeness < 0.8;
#+end

Additional Pattern Tests

String Format Validation

CoginitiScript Pattern:

#+test sql EmailFormat()
#+begin
SELECT
user_id,
email
FROM users
WHERE email NOT LIKE '%_@__%.__%'
OR email LIKE '%..%'
OR email LIKE '%@%@%';
#+end

Date Range Validation

CoginitiScript Pattern:

#+test sql DateRange()
#+begin
SELECT
event_id,
event_date
FROM events
WHERE event_date < '1900-01-01'
OR event_date > CURRENT_DATE + INTERVAL '10 years';
#+end

Enum/Category Validation

CoginitiScript Pattern:

#+test sql EnumValidation()
#+begin
SELECT
order_id,
status
FROM orders
WHERE status NOT IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
#+end

Sequence Gap Detection

CoginitiScript Pattern:

#+test sql SequenceGaps()
#+begin
WITH sequence_check AS (
SELECT
invoice_number,
LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_number
FROM invoices
WHERE created_date = CURRENT_DATE
)
SELECT
invoice_number,
next_number,
(next_number - invoice_number) as gap_size
FROM sequence_check
WHERE next_number - invoice_number > 1;
#+end

Rate Limit Check

CoginitiScript Pattern:

#+test sql RateLimit()
#+begin
SELECT
user_id,
COUNT(*) as request_count
FROM api_requests
WHERE request_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
GROUP BY user_id
HAVING COUNT(*) > 1000;
#+end

Database-Specific Syntax

PostgreSQL

-- String aggregation
STRING_AGG(column_name::TEXT, ', ')

-- Array creation
unnest(ARRAY['value1', 'value2'])

-- Percentiles
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY column)

MySQL

-- String aggregation
GROUP_CONCAT(column_name SEPARATOR ', ')

-- Limited CTE support in older versions
-- Use subqueries instead

-- Array membership
FIND_IN_SET('value', column)

SQL Server

-- String aggregation (2017+)
STRING_AGG(column_name, ', ')

-- Percentiles
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY column) OVER ()

-- Safe type conversion
TRY_CONVERT(INT, column)

Snowflake

-- String aggregation
LISTAGG(column_name, ', ')

-- Fast approximate percentiles
APPROX_PERCENTILE(column, 0.95)

-- Array creation
ARRAY_CONSTRUCT('value1', 'value2')

Best Practices

Test Design

  1. Return diagnostic information: Include columns that help debug failures
  2. Use descriptive names: Test names should explain what's being validated
  3. Document thresholds: Explain why specific values matter
  4. Consider edge cases: Handle NULLs, empty tables, and boundary conditions

Performance

  1. Index appropriately: Ensure test queries use existing indexes
  2. Limit data ranges: Test recent data for faster feedback
  3. Use sampling: For statistical tests on large datasets
  4. Partition tables: Enable efficient time-based filtering

Organization

  1. Group by test type: Separate uniqueness, completeness, integrity tests
  2. Layer by severity: Critical tests separate from monitoring
  3. Version control: Track test changes with schema evolution
  4. Document dependencies: Note relationships between tests

Maintenance

  1. Review test failures: Don't ignore or auto-skip failing tests
  2. Update thresholds: Adjust ranges as business evolves
  3. Archive obsolete tests: Remove tests for deprecated features
  4. Monitor test performance: Optimize slow-running tests