Great Expectations to CoginitiScript Conversion Reference
This reference guide provides direct mappings between Python-based Great Expectations tests and SQL-based CoginitiScript equivalents. Use this as a lookup table when converting existing Great Expectations suites or when familiar with Great Expectations patterns.
Conversion Model
CoginitiScript uses an assertion-based model where:
- Pass: Query returns zero rows
- Fail: Query returns rows showing which records failed
This contrasts with Great Expectations' boolean model but provides more diagnostic information.
Why Convert to SQL-Based Tests
SQL-based data quality tests offer several advantages:
- Performance: Run directly on the database without data transfer overhead
- Simplicity: No Python environment or dependencies required
- Integration: Fit naturally into SQL-based data pipelines
- Transparency: Show exactly which records fail, not just that failures exist
- Portability: Work across different database platforms with minor syntax adjustments
Distribution Tests
Distribution tests validate statistical properties and ranges to detect anomalies and ensure consistency.
Values Between Min/Max
Great Expectations: ExpectColumnValuesToBeBetween
CoginitiScript Equivalent:
#+test sql ValuesBetweenRange()
#+begin
SELECT *
FROM orders
WHERE order_amount NOT BETWEEN 0 AND 10000;
#+end
Z-Score Outlier Detection
Great Expectations: ExpectColumnValueZScoresToBeLessThan
CoginitiScript Equivalent:
#+test sql ZScoreOutliers()
#+begin
WITH stats AS (
SELECT
AVG(price) as mean,
STDDEV(price) as std
FROM products
)
SELECT
p.product_id,
p.price,
ABS(p.price - s.mean) / NULLIF(s.std, 0) as z_score
FROM products p
CROSS JOIN stats s
WHERE ABS(p.price - s.mean) / NULLIF(s.std, 0) > 3;
#+end
Mean Within Expected Range
Great Expectations: Summary statistic expectations (mean)
CoginitiScript Equivalent:
#+test sql MeanInRange()
#+begin
SELECT
'Mean out of range' as issue,
avg_response
FROM (
SELECT AVG(response_time_ms) as avg_response
FROM api_calls
WHERE date = CURRENT_DATE
)
WHERE avg_response NOT BETWEEN 100 AND 500;
#+end
Percentile Validation
Great Expectations: Quantile expectations
CoginitiScript Equivalent:
#+test sql PercentileThreshold()
#+begin
SELECT
'P95 too high' as issue,
p95
FROM (
SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency) as p95
FROM requests
)
WHERE p95 > 1000;
#+end
Median Within Range
Great Expectations: Summary statistic expectations (median)
CoginitiScript Equivalent:
#+test sql MedianInRange()
#+begin
SELECT
'Median out of range' as issue,
median_val
FROM (
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) as median_val
FROM products
)
WHERE median_val NOT BETWEEN 50 AND 500;
#+end
Standard Deviation Check
Great Expectations: Summary statistic expectations (standard deviation)
CoginitiScript Equivalent:
#+test sql StandardDeviationInRange()
#+begin
SELECT
'Standard deviation out of range' as issue,
STDDEV(transaction_amount) as std_dev
FROM transactions
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
HAVING STDDEV(transaction_amount) NOT BETWEEN 10 AND 1000;
#+end
Uniqueness Tests
Uniqueness tests validate distinct values and ensure keys remain unique.
Single Column Uniqueness
Great Expectations: ExpectColumnValuesToBeUnique
CoginitiScript Equivalent:
#+test sql UniqueValues()
#+begin
SELECT
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
#+end
Composite Key Uniqueness
Great Expectations: ExpectCompoundColumnsToBeUnique
CoginitiScript Equivalent:
#+test sql CompositeKeyUnique()
#+begin
SELECT
customer_id,
order_date,
COUNT(*) as duplicate_count
FROM orders
GROUP BY customer_id, order_date
HAVING COUNT(*) > 1;
#+end
Uniqueness Ratio in Range
Great Expectations: ExpectColumnProportionOfUniqueValuesToBeBetween
CoginitiScript Equivalent:
#+test sql UniquenessRatio()
#+begin
SELECT
'Uniqueness ratio out of range' as issue,
CAST(COUNT(DISTINCT email) AS FLOAT) / NULLIF(COUNT(*), 0) as uniqueness_ratio
FROM customers
HAVING CAST(COUNT(DISTINCT email) AS FLOAT) / NULLIF(COUNT(*), 0) NOT BETWEEN 0.8 AND 1.0;
#+end
Distinct Value Count in Range
Great Expectations: ExpectColumnUniqueValueCountToBeBetween
CoginitiScript Equivalent:
#+test sql DistinctCountInRange()
#+begin
SELECT
'Distinct count out of range' as issue,
COUNT(DISTINCT state) as distinct_states
FROM addresses
HAVING COUNT(DISTINCT state) NOT BETWEEN 5 AND 50;
#+end
Values Unique Within Row
Great Expectations: ExpectSelectColumnValuesToBeUniqueWithinRecord
CoginitiScript Equivalent:
#+test sql ValuesUniqueWithinRow()
#+begin
SELECT
contact_id,
phone1,
phone2,
phone3
FROM contacts
WHERE phone1 = phone2
OR phone1 = phone3
OR phone2 = phone3;
#+end
Detailed Duplicate Analysis
CoginitiScript Pattern:
#+test sql DetailedDuplicateAnalysis()
#+begin
SELECT
username,
COUNT(*) as occurrence_count,
STRING_AGG(user_id::TEXT, ', ') as conflicting_ids
FROM users
GROUP BY username
HAVING COUNT(*) > 1
ORDER BY occurrence_count DESC;
#+end
Schema Validation Tests
Schema tests validate table structure, column presence, data types, and schema conformance.
Column Exists
Great Expectations: ExpectColumnToExist
CoginitiScript Equivalent:
#+test sql ColumnExists()
#+begin
SELECT 'Column email does not exist' as issue
WHERE NOT EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_name = 'users'
AND column_name = 'email'
AND table_schema = 'public'
);
#+end
Exact Column Count
Great Expectations: ExpectTableColumnCountToEqual
CoginitiScript Equivalent:
#+test sql ExactColumnCount()
#+begin
SELECT
'Column count mismatch' as issue,
COUNT(*) as actual_count,
15 as expected_count
FROM information_schema.columns
WHERE table_name = 'orders'
AND table_schema = 'public'
HAVING COUNT(*) != 15;
#+end
Column Count in Range
Great Expectations: ExpectTableColumnCountToBeBetween
CoginitiScript Equivalent:
#+test sql ColumnCountInRange()
#+begin
SELECT
'Column count out of range' as issue,
COUNT(*) as actual_count
FROM information_schema.columns
WHERE table_name = 'transactions'
AND table_schema = 'public'
HAVING COUNT(*) NOT BETWEEN 10 AND 20;
#+end
Columns Match Expected Set
Great Expectations: ExpectTableColumnsToMatchSet
CoginitiScript Equivalent:
#+test sql ColumnsMatchSet()
#+begin
WITH expected_columns AS (
SELECT unnest(ARRAY['id', 'name', 'email', 'created_at']) as col_name
),
actual_columns AS (
SELECT column_name as col_name
FROM information_schema.columns
WHERE table_name = 'users' AND table_schema = 'public'
),
missing AS (
SELECT col_name FROM expected_columns
EXCEPT
SELECT col_name FROM actual_columns
),
extra AS (
SELECT col_name FROM actual_columns
EXCEPT
SELECT col_name FROM expected_columns
)
SELECT
'Schema mismatch' as issue,
COALESCE((SELECT STRING_AGG(col_name, ', ') FROM missing), '') as missing_columns,
COALESCE((SELECT STRING_AGG(col_name, ', ') FROM extra), '') as extra_columns
FROM (SELECT 1) dummy
WHERE EXISTS (SELECT 1 FROM missing)
OR EXISTS (SELECT 1 FROM extra);
#+end
Columns in Expected Order
Great Expectations: ExpectTableColumnsToMatchOrderedList
CoginitiScript Equivalent:
#+test sql ColumnsInOrder()
#+begin
WITH expected AS (
SELECT unnest(ARRAY['id', 'customer_id', 'order_date', 'amount']) as col_name,
generate_series(1, 4) as position
),
actual AS (
SELECT column_name as col_name, ordinal_position as position
FROM information_schema.columns
WHERE table_name = 'orders' AND table_schema = 'public'
)
SELECT
e.position,
e.col_name as expected_column,
a.col_name as actual_column
FROM expected e
FULL OUTER JOIN actual a ON e.position = a.position
WHERE e.col_name IS DISTINCT FROM a.col_name;
#+end
Column Type Validation
Great Expectations: ExpectColumnValuesToBeOfType
CoginitiScript Equivalent:
#+test sql ColumnType()
#+begin
SELECT
'Type mismatch for column email' as issue,
data_type as actual_type,
'character varying' as expected_type
FROM information_schema.columns
WHERE table_name = 'users'
AND table_schema = 'public'
AND column_name = 'email'
AND data_type != 'character varying';
#+end
Column Type in Allowed List
Great Expectations: ExpectColumnValuesToBeInTypeList
CoginitiScript Equivalent:
#+test sql ColumnTypeInList()
#+begin
SELECT
'Invalid type for amount column' as issue,
data_type as actual_type
FROM information_schema.columns
WHERE table_name = 'transactions'
AND table_schema = 'public'
AND column_name = 'amount'
AND data_type NOT IN ('numeric', 'decimal', 'double precision', 'money');
#+end
Multi-Column Type Validation
CoginitiScript Pattern:
#+test sql MultiColumnTypeValidation()
#+begin
WITH expected_types AS (
SELECT 'id' as col, 'integer' as expected_type
UNION ALL SELECT 'email', 'character varying'
UNION ALL SELECT 'created_at', 'timestamp without time zone'
UNION ALL SELECT 'balance', 'numeric'
)
SELECT
e.col as column_name,
e.expected_type,
c.data_type as actual_type
FROM expected_types e
LEFT JOIN information_schema.columns c
ON c.column_name = e.col
AND c.table_name = 'accounts'
AND c.table_schema = 'public'
WHERE c.data_type IS NULL
OR c.data_type != e.expected_type;
#+end
Data Integrity Tests
Data integrity tests validate relationships between columns and ensure consistency.
Column Pair Values Equal
Great Expectations: ExpectColumnPairValuesToBeEqual
CoginitiScript Equivalent:
#+test sql ColumnPairEqual()
#+begin
SELECT
transaction_id,
sender_reference,
recipient_confirmation
FROM transactions
WHERE sender_reference != recipient_confirmation;
#+end
Multi-Column Sum Equals Expected
Great Expectations: ExpectMulticolumnSumToEqual
CoginitiScript Equivalent:
#+test sql MultiColumnSum()
#+begin
SELECT
account_id,
debits,
credits,
(debits + credits) as total
FROM account_balances
WHERE (debits + credits) != 0;
#+end
Column A Greater Than Column B
Great Expectations: ExpectColumnPairValuesAToBeGreaterThanB
CoginitiScript Equivalent:
#+test sql ColumnAGreaterThanB()
#+begin
SELECT
transfer_id,
sent_timestamp,
received_timestamp
FROM transfers
WHERE sent_timestamp >= received_timestamp;
#+end
Cross-Column Percentage Validation
CoginitiScript Pattern:
#+test sql CrossColumnPercentage()
#+begin
SELECT
order_id,
subtotal,
tax,
total,
(subtotal + tax) as calculated_total
FROM orders
WHERE ABS(total - (subtotal + tax)) > 0.01;
#+end
Referential Integrity Check
CoginitiScript Pattern:
#+test sql ReferentialIntegrity()
#+begin
SELECT
o.order_id,
o.customer_id
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
#+end
Cross-Table Balance Reconciliation
CoginitiScript Pattern:
#+test sql CrossTableBalance()
#+begin
SELECT
'Balance mismatch' as issue,
source_total,
target_total,
(source_total - target_total) as difference
FROM (
SELECT SUM(amount) as source_total
FROM source_transactions
WHERE date = CURRENT_DATE
) s
CROSS JOIN (
SELECT SUM(amount) as target_total
FROM target_transactions
WHERE date = CURRENT_DATE
) t
WHERE ABS(source_total - target_total) > 0.01;
#+end
Volume Tests
Volume tests validate row counts and detect volume anomalies.
Row Count Between Range
Great Expectations: ExpectTableRowCountToBeBetween
CoginitiScript Equivalent:
#+test sql RowCountInRange()
#+begin
SELECT
'Row count out of range' as issue,
COUNT(*) as actual_count,
1000 as min_expected,
5000 as max_expected
FROM daily_transactions
WHERE transaction_date = CURRENT_DATE
HAVING COUNT(*) NOT BETWEEN 1000 AND 5000;
#+end
Exact Row Count
Great Expectations: ExpectTableRowCountToEqual
CoginitiScript Equivalent:
#+test sql ExactRowCount()
#+begin
SELECT
'Row count mismatch' as issue,
COUNT(*) as actual_count,
100 as expected_count
FROM batch_import
WHERE batch_id = 'BATCH_2024_001'
HAVING COUNT(*) != 100;
#+end
Row Count Matches Other Table
Great Expectations: ExpectTableRowCountToEqualOtherTable
CoginitiScript Equivalent:
#+test sql RowCountMatchesTable()
#+begin
SELECT
'Row count mismatch between tables' as issue,
source_count,
target_count,
(source_count - target_count) as difference
FROM (
SELECT COUNT(*) as source_count
FROM source_table
WHERE processed_date = CURRENT_DATE
) s
CROSS JOIN (
SELECT COUNT(*) as target_count
FROM target_table
WHERE processed_date = CURRENT_DATE
) t
WHERE source_count != target_count;
#+end
Row Count Growth Validation
CoginitiScript Pattern:
#+test sql VolumeGrowth()
#+begin
WITH daily_counts AS (
SELECT COUNT(*) as today_count
FROM user_events
WHERE event_date = CURRENT_DATE
),
yesterday_counts AS (
SELECT COUNT(*) as yesterday_count
FROM user_events
WHERE event_date = CURRENT_DATE - INTERVAL '1 day'
)
SELECT
'Unexpected volume change' as issue,
today_count,
yesterday_count,
ROUND(100.0 * (today_count - yesterday_count) / NULLIF(yesterday_count, 0), 2) as pct_change
FROM daily_counts
CROSS JOIN yesterday_counts
WHERE today_count < yesterday_count * 0.8
OR today_count > yesterday_count * 2.0;
#+end
Partition Volume Check
CoginitiScript Pattern:
#+test sql PartitionVolume()
#+begin
SELECT
product_category,
COUNT(*) as row_count
FROM sales
WHERE sale_date = CURRENT_DATE
GROUP BY product_category
HAVING COUNT(*) < 10;
#+end
Missing Values and Completeness Tests
Completeness tests validate required data presence and null value patterns.
Column Values Not Null
Great Expectations: ExpectColumnValuesToNotBeNull
CoginitiScript Equivalent:
#+test sql NoNullValues()
#+begin
SELECT
user_id,
email
FROM users
WHERE email IS NULL;
#+end
Column Values Are Null
Great Expectations: ExpectColumnValuesToBeNull
CoginitiScript Equivalent:
#+test sql ValuesAreNull()
#+begin
SELECT
order_id,
cancellation_reason
FROM orders
WHERE status = 'completed'
AND cancellation_reason IS NOT NULL;
#+end
Non-Null Proportion in Range
Great Expectations: ExpectColumnProportionOfNonNullValuesToBeBetween
CoginitiScript Equivalent:
#+test sql CompletenessRatio()
#+begin
SELECT
'Completeness out of range' as issue,
CAST(COUNT(phone_number) AS FLOAT) / NULLIF(COUNT(*), 0) as completeness_ratio
FROM customers
HAVING CAST(COUNT(phone_number) AS FLOAT) / NULLIF(COUNT(*), 0) NOT BETWEEN 0.7 AND 1.0;
#+end
Null Count Threshold
CoginitiScript Pattern:
#+test sql NullCountThreshold()
#+begin
SELECT
'Too many null values' as issue,
COUNT(*) as total_rows,
COUNT(*) - COUNT(middle_name) as null_count,
ROUND(100.0 * (COUNT(*) - COUNT(middle_name)) / NULLIF(COUNT(*), 0), 2) as null_pct
FROM employees
HAVING (COUNT(*) - COUNT(middle_name)) > 100;
#+end
Multi-Column Completeness
CoginitiScript Pattern:
#+test sql MultiColumnCompleteness()
#+begin
SELECT
customer_id,
email,
phone,
mailing_address
FROM customers
WHERE email IS NULL
AND phone IS NULL
AND mailing_address IS NULL;
#+end
Conditional Completeness
CoginitiScript Pattern:
#+test sql ConditionalCompleteness()
#+begin
SELECT
order_id,
shipping_address,
order_type
FROM orders
WHERE order_type = 'physical'
AND shipping_address IS NULL;
#+end
Completeness Trend Check
CoginitiScript Pattern:
#+test sql CompletenessTrend()
#+begin
WITH today_completeness AS (
SELECT
CAST(COUNT(optional_field) AS FLOAT) / NULLIF(COUNT(*), 0) as completeness
FROM records
WHERE created_date = CURRENT_DATE
)
SELECT
'Completeness degradation' as issue,
completeness as today_completeness
FROM today_completeness
WHERE completeness < 0.8;
#+end
Additional Pattern Tests
String Format Validation
CoginitiScript Pattern:
#+test sql EmailFormat()
#+begin
SELECT
user_id,
email
FROM users
WHERE email NOT LIKE '%_@__%.__%'
OR email LIKE '%..%'
OR email LIKE '%@%@%';
#+end
Date Range Validation
CoginitiScript Pattern:
#+test sql DateRange()
#+begin
SELECT
event_id,
event_date
FROM events
WHERE event_date < '1900-01-01'
OR event_date > CURRENT_DATE + INTERVAL '10 years';
#+end
Enum/Category Validation
CoginitiScript Pattern:
#+test sql EnumValidation()
#+begin
SELECT
order_id,
status
FROM orders
WHERE status NOT IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
#+end
Sequence Gap Detection
CoginitiScript Pattern:
#+test sql SequenceGaps()
#+begin
WITH sequence_check AS (
SELECT
invoice_number,
LEAD(invoice_number) OVER (ORDER BY invoice_number) as next_number
FROM invoices
WHERE created_date = CURRENT_DATE
)
SELECT
invoice_number,
next_number,
(next_number - invoice_number) as gap_size
FROM sequence_check
WHERE next_number - invoice_number > 1;
#+end
Rate Limit Check
CoginitiScript Pattern:
#+test sql RateLimit()
#+begin
SELECT
user_id,
COUNT(*) as request_count
FROM api_requests
WHERE request_time >= CURRENT_TIMESTAMP - INTERVAL '1 hour'
GROUP BY user_id
HAVING COUNT(*) > 1000;
#+end
Database-Specific Syntax
PostgreSQL
-- String aggregation
STRING_AGG(column_name::TEXT, ', ')
-- Array creation
unnest(ARRAY['value1', 'value2'])
-- Percentiles
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY column)
MySQL
-- String aggregation
GROUP_CONCAT(column_name SEPARATOR ', ')
-- Limited CTE support in older versions
-- Use subqueries instead
-- Array membership
FIND_IN_SET('value', column)
SQL Server
-- String aggregation (2017+)
STRING_AGG(column_name, ', ')
-- Percentiles
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY column) OVER ()
-- Safe type conversion
TRY_CONVERT(INT, column)
Snowflake
-- String aggregation
LISTAGG(column_name, ', ')
-- Fast approximate percentiles
APPROX_PERCENTILE(column, 0.95)
-- Array creation
ARRAY_CONSTRUCT('value1', 'value2')
Best Practices
Test Design
- Return diagnostic information: Include columns that help debug failures
- Use descriptive names: Test names should explain what's being validated
- Document thresholds: Explain why specific values matter
- Consider edge cases: Handle NULLs, empty tables, and boundary conditions
Performance
- Index appropriately: Ensure test queries use existing indexes
- Limit data ranges: Test recent data for faster feedback
- Use sampling: For statistical tests on large datasets
- Partition tables: Enable efficient time-based filtering
Organization
- Group by test type: Separate uniqueness, completeness, integrity tests
- Layer by severity: Critical tests separate from monitoring
- Version control: Track test changes with schema evolution
- Document dependencies: Note relationships between tests
Maintenance
- Review test failures: Don't ignore or auto-skip failing tests
- Update thresholds: Adjust ranges as business evolves
- Archive obsolete tests: Remove tests for deprecated features
- Monitor test performance: Optimize slow-running tests