Skip to main content

CoginitiScript: Working with LLM Blocks

Coming in Coginiti 26.6

LLM blocks are scheduled to ship in the Coginiti 26.6 release. This guide is published as a preview so you can plan ahead — the feature is not yet available in earlier versions, and some details may change before general availability.

LLM blocks let you generate structured data from AI models directly inside CoginitiScript. This guide covers practical patterns for using LLM blocks in your data workflows — from generating test data to enriching datasets and building AI-powered pipelines.

What You'll Learn

  • How to define LLM blocks with output schemas
  • How to use LLM-generated data in SQL queries
  • How to pass context from SQL into LLM prompts
  • How to chain multiple LLM blocks together
  • How to publish LLM results to database tables
  • How to control caching behavior

Prerequisites

  • Basic understanding of CoginitiScript syntax (see CoginitiScript Reference)
  • Familiarity with SQL
  • AI Assistant configured in your Coginiti instance (see How to Configure AI Assistant)
  • A database connection that supports table creation from recordsets (e.g., PostgreSQL, Snowflake, BigQuery, Greenplum, Redshift)

Generating Structured Data with LLM Blocks

The Basics

An LLM block sends a prompt to the AI model and returns tabular data. Define the block with #+src llm, declare the expected output columns in :schema, and write your prompt in the body:

#+src llm SampleCustomers()
#+meta {
:schema {
:columns [{:name "id", :type "INT", :description "Customer ID"},
{:name "name", :type "STRING", :description "Full name"},
{:name "email", :type "STRING", :description "Email address"},
{:name "tier", :type "STRING", :description "Bronze, Silver, or Gold"}]
}
}
#+begin
Generate 10 realistic sample customer records for an e-commerce platform.
Include a mix of Bronze, Silver, and Gold tier customers.
#+end

Execute this block to see the AI-generated customer data as a result set.

Using Schema Descriptions for Better Results

The :description field in each column map is included in the system prompt sent to the AI model. Use it to guide the LLM toward generating appropriate values:

#+meta {
:schema {
:columns [{:name "score", :type "DECIMAL(5,2)", :description "NPS score between -100.00 and 100.00"},
{:name "date", :type "DATE", :description "Date in the last 30 days"}]
}
}

Supported Data Types

LLM blocks support all standard types. Use parameterized types for precision:

:schema {
:columns [{:name "id", :type "INT"},
{:name "name", :type "STRING"},
{:name "price", :type "DECIMAL(10,2)"},
{:name "weight", :type "FLOAT"},
{:name "active", :type "BOOL"},
{:name "created", :type "DATE"},
{:name "updated", :type "TIMESTAMP(0)"},
{:name "check_at", :type "TIME(2)"},
{:name "thumbnail", :type "BINARY"}]
}

Querying LLM Data with SQL

Basic SQL Reference

Reference an LLM block from SQL using the standard {{ }} syntax:

#+src llm inventory()
#+meta {
:schema {
:columns [{:name "product", :type "STRING"},
{:name "quantity", :type "INT"},
{:name "in_stock", :type "BOOL"}]
}
}
#+begin
Generate warehouse inventory for 20 electronics products.
#+end

#+src sql LowStockReport()
#+begin
SELECT product, quantity
FROM {{ inventory() }}
WHERE in_stock = true AND quantity < 10
ORDER BY quantity ASC
#+end

The LLM block's output is materialized as a temporary table, allowing full SQL operations — filtering, aggregation, sorting, and joins.

Joining Multiple LLM Blocks

#+src llm customers()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "name", :type "STRING"}]
}
}
#+begin
Generate 5 customer records.
#+end

#+src llm orders()
#+meta {
:schema {
:columns [{:name "order_id", :type "INT"},
{:name "customer_id", :type "INT"},
{:name "amount", :type "DECIMAL(10,2)"}]
}
}
#+begin
Generate 15 order records with customer_id values between 1 and 5.
#+end

#+src sql CustomerOrderSummary()
#+begin
SELECT c.name, COUNT(*) AS order_count, SUM(o.amount) AS total_spent
FROM {{ customers() }} c
JOIN {{ orders() }} o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC
#+end

Passing SQL Data into LLM Prompts

Use print.Csv() to embed SQL query results as CSV text in an LLM prompt. This lets the AI analyze your actual data:

#+import "std/print"

#+src sql TopProducts()
#+begin
SELECT product_name, total_sales, return_rate
FROM product_metrics
ORDER BY total_sales DESC
LIMIT 20
#+end

#+src llm ProductInsights()
#+meta {
:schema {
:columns [{:name "product", :type "STRING"},
{:name "insight", :type "STRING"},
{:name "risk_level", :type "STRING", :description "LOW, MEDIUM, or HIGH"}]
}
}
#+begin
Analyze the following product performance data and provide insights.
Flag products with high return rates as risks.

{{ print.Csv(TopProducts()) }}
#+end

#+src sql RiskReport()
#+begin
SELECT product, insight, risk_level
FROM {{ ProductInsights() }}
WHERE risk_level = 'HIGH'
#+end

Chaining LLM Blocks

LLM blocks cannot be used as table references from other LLM blocks (i.e., {{ llmBlock() }} as a relation is not allowed). However, you can pass data between LLM blocks using print.Csv() or iterator().

Using print.Csv() for Bulk Data

Use print.Csv() to embed another LLM block's entire result as CSV text in a prompt — this works the same way as embedding SQL data:

#+import "std/print"

#+src llm RawFindings()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "finding", :type "STRING"},
{:name "severity", :type "STRING"}]
}
}
#+begin
Generate 10 security audit findings for a web application.
#+end

#+src llm PrioritizedFindings()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "priority", :type "INT", :description "1=highest, 5=lowest"},
{:name "action", :type "STRING"}]
}
}
#+begin
Review the following security findings and assign a priority (1=highest to 5=lowest).
For each finding, recommend a specific remediation action.

{{ print.Csv(RawFindings()) }}
#+end

Using iterator() for Row-Level Access

Use iterator() when you need to access individual rows or specific columns:

#+src llm ExtractTopics()
#+meta {
:schema {
:columns [{:name "topic", :type "STRING"},
{:name "score", :type "FLOAT"}]
}
}
#+begin
Identify the top 5 trending data engineering topics for this quarter.
#+end

#+src llm GenerateArticleOutlines()
#+meta {
:schema {
:columns [{:name "topic", :type "STRING"},
{:name "outline", :type "STRING"}]
}
}
#+begin
Generate article outlines for the following topics:
#+for row : iterator(ExtractTopics()) separator "\n" do
- {{ row["topic"] }} (relevance score: {{ row["score"] }})
#+end
#+end

Access columns by name (row["topic"]) or by index (row[0]).

Parameterizing LLM Blocks

Block Arguments

LLM blocks accept arguments just like SQL blocks:

#+src llm GenerateTestData(table_name, row_count)
#+meta {
:schema {
:columns [{:name "sql", :type "STRING"}]
}
}
#+begin
Generate {{ row_count }} realistic INSERT statements for the {{ table_name }} table.
#+end

Script Parameters

Use $$ parameters for values passed at execution time:

#+src llm AnalyzeRegion()
#+meta {
:schema {
:columns [{:name "metric", :type "STRING"},
{:name "value", :type "FLOAT"}]
}
}
#+begin
Generate key performance metrics for the $$region region in $$year.
#+end

Using Constants

Define constants with #+const for reuse within the prompt body:

#+src llm BuildReport()
#+meta {
:schema {
:columns [{:name "report", :type "STRING"}]
}
}
#+const
departments = ["Sales", "Marketing", "Engineering"];
fiscal_year = 2025;
#+begin
Create a summary report for fiscal year {{ fiscal_year }}.
Departments to include:
#+for dept : departments separator ", " do{{ dept }}#+end.
#+end

Controlling Result Caching

By default, LLM block results are not cached. Enable caching when a block is referenced multiple times to avoid redundant LLM calls:

#+src llm SharedData()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "val", :type "STRING"}]
},
:cache_results true
}
#+begin
Generate shared reference data.
#+end

#+src sql CombinedView()
#+begin
-- Both references reuse the same cached result
SELECT id, val, 'source_a' AS origin FROM {{ SharedData() }}
UNION ALL
SELECT id, val, 'source_b' AS origin FROM {{ SharedData() }}
#+end

When caching is disabled (the default), each reference triggers a separate LLM call:

#+meta {
:schema { ... }
-- :cache_results defaults to false; no need to set it explicitly
}

Publishing LLM Results to Tables

LLM blocks support table publication with the same metadata as SQL blocks:

#+src llm GenerateMetrics()
#+meta {
:schema {
:columns [{:name "date_day", :type "DATE"},
{:name "metric", :type "STRING"},
{:name "value", :type "FLOAT"}]
},
:publication {
:type "table",
:name "llm_metrics",
:schema "analytics"
}
}
#+begin
Generate daily performance metrics for the past week.
#+end

Incremental Publication

Use append or merge strategies for incremental updates:

#+src llm DailySentiment()
#+meta {
:schema {
:columns [{:name "date_day", :type "DATE"},
{:name "sentiment", :type "FLOAT"},
{:name "summary", :type "STRING"}]
},
:publication {
:type "table",
:name "daily_sentiment",
:incremental "merge",
:unique_key ["date_day"]
}
}
#+begin
Analyze today's customer feedback and generate a sentiment score.
#+end

Running Publications Programmatically

#+import "std/publication"

{{ publication.Run(blocks=[GenerateMetrics, DailySentiment], parallelism=2) }}
warning

View publication is not supported for LLM blocks. Use :type "table" only.

Common Patterns

Test Data Generation

#+src llm TestOrders()
#+meta {
:schema {
:columns [{:name "order_id", :type "INT"},
{:name "customer_id", :type "INT"},
{:name "product", :type "STRING"},
{:name "amount", :type "DECIMAL(10,2)"},
{:name "status", :type "STRING", :description "pending, shipped, or delivered"}]
}
}
#+begin
Generate 50 realistic e-commerce order records.
Ensure a realistic distribution of statuses: 20% pending, 30% shipped, 50% delivered.
#+end

Data Enrichment

#+import "std/print"

#+src sql RawAddresses()
#+begin
SELECT address_id, street_address, city, state FROM addresses WHERE country_code IS NULL LIMIT 100
#+end

#+src llm EnrichedAddresses()
#+meta {
:schema {
:columns [{:name "address_id", :type "INT"},
{:name "country_code", :type "STRING", :description "ISO 3166-1 alpha-2 country code"}]
}
}
#+begin
For each address below, determine the most likely country code (ISO 3166-1 alpha-2).

{{ print.Csv(RawAddresses()) }}
#+end

#+src sql UpdateAddresses()
#+begin
UPDATE addresses a
SET country_code = e.country_code
FROM {{ EnrichedAddresses() }} e
WHERE a.address_id = e.address_id
#+end

Conditional LLM Prompt Generation

#+src llm FlexibleAnalysis(detail_level)
#+meta {
:schema {
:columns [{:name "finding", :type "STRING"},
{:name "severity", :type "STRING"}]
}
}
#+const
is_detailed = detail_level == "high";
#+begin
Analyze system health metrics and generate findings.
#+if is_detailed then
Include root cause analysis and remediation steps for each finding.
Provide severity as CRITICAL, HIGH, MEDIUM, or LOW.
#+else
Provide a high-level summary only.
Provide severity as HIGH or LOW.
#+end
#+end

Limitations

  • Platform support: SQL blocks can only reference LLM blocks on platforms that support creating tables from recordset data. Athena and SQL Server are not currently supported.
  • No direct LLM-to-LLM table references: {{ llmBlock() }} cannot be used as a relation inside another LLM block. Use print.Csv() or iterator() instead.
  • No view publication: Only table publication is supported for LLM blocks.
  • Schema required for SQL references: LLM blocks referenced by SQL blocks should have a :schema defined.

Next Steps