CoginitiScript: Working with LLM Blocks
LLM blocks are scheduled to ship in the Coginiti 26.6 release. This guide is published as a preview so you can plan ahead — the feature is not yet available in earlier versions, and some details may change before general availability.
LLM blocks let you generate structured data from AI models directly inside CoginitiScript. This guide covers practical patterns for using LLM blocks in your data workflows — from generating test data to enriching datasets and building AI-powered pipelines.
What You'll Learn
- How to define LLM blocks with output schemas
- How to use LLM-generated data in SQL queries
- How to pass context from SQL into LLM prompts
- How to chain multiple LLM blocks together
- How to publish LLM results to database tables
- How to control caching behavior
Prerequisites
- Basic understanding of CoginitiScript syntax (see CoginitiScript Reference)
- Familiarity with SQL
- AI Assistant configured in your Coginiti instance (see How to Configure AI Assistant)
- A database connection that supports table creation from recordsets (e.g., PostgreSQL, Snowflake, BigQuery, Greenplum, Redshift)
Generating Structured Data with LLM Blocks
The Basics
An LLM block sends a prompt to the AI model and returns tabular data. Define the block with #+src llm, declare the expected output columns in :schema, and write your prompt in the body:
#+src llm SampleCustomers()
#+meta {
:schema {
:columns [{:name "id", :type "INT", :description "Customer ID"},
{:name "name", :type "STRING", :description "Full name"},
{:name "email", :type "STRING", :description "Email address"},
{:name "tier", :type "STRING", :description "Bronze, Silver, or Gold"}]
}
}
#+begin
Generate 10 realistic sample customer records for an e-commerce platform.
Include a mix of Bronze, Silver, and Gold tier customers.
#+end
Execute this block to see the AI-generated customer data as a result set.
Using Schema Descriptions for Better Results
The :description field in each column map is included in the system prompt sent to the AI model. Use it to guide the LLM toward generating appropriate values:
#+meta {
:schema {
:columns [{:name "score", :type "DECIMAL(5,2)", :description "NPS score between -100.00 and 100.00"},
{:name "date", :type "DATE", :description "Date in the last 30 days"}]
}
}
Supported Data Types
LLM blocks support all standard types. Use parameterized types for precision:
:schema {
:columns [{:name "id", :type "INT"},
{:name "name", :type "STRING"},
{:name "price", :type "DECIMAL(10,2)"},
{:name "weight", :type "FLOAT"},
{:name "active", :type "BOOL"},
{:name "created", :type "DATE"},
{:name "updated", :type "TIMESTAMP(0)"},
{:name "check_at", :type "TIME(2)"},
{:name "thumbnail", :type "BINARY"}]
}
Querying LLM Data with SQL
Basic SQL Reference
Reference an LLM block from SQL using the standard {{ }} syntax:
#+src llm inventory()
#+meta {
:schema {
:columns [{:name "product", :type "STRING"},
{:name "quantity", :type "INT"},
{:name "in_stock", :type "BOOL"}]
}
}
#+begin
Generate warehouse inventory for 20 electronics products.
#+end
#+src sql LowStockReport()
#+begin
SELECT product, quantity
FROM {{ inventory() }}
WHERE in_stock = true AND quantity < 10
ORDER BY quantity ASC
#+end
The LLM block's output is materialized as a temporary table, allowing full SQL operations — filtering, aggregation, sorting, and joins.
Joining Multiple LLM Blocks
#+src llm customers()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "name", :type "STRING"}]
}
}
#+begin
Generate 5 customer records.
#+end
#+src llm orders()
#+meta {
:schema {
:columns [{:name "order_id", :type "INT"},
{:name "customer_id", :type "INT"},
{:name "amount", :type "DECIMAL(10,2)"}]
}
}
#+begin
Generate 15 order records with customer_id values between 1 and 5.
#+end
#+src sql CustomerOrderSummary()
#+begin
SELECT c.name, COUNT(*) AS order_count, SUM(o.amount) AS total_spent
FROM {{ customers() }} c
JOIN {{ orders() }} o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC
#+end
Passing SQL Data into LLM Prompts
Use print.Csv() to embed SQL query results as CSV text in an LLM prompt. This lets the AI analyze your actual data:
#+import "std/print"
#+src sql TopProducts()
#+begin
SELECT product_name, total_sales, return_rate
FROM product_metrics
ORDER BY total_sales DESC
LIMIT 20
#+end
#+src llm ProductInsights()
#+meta {
:schema {
:columns [{:name "product", :type "STRING"},
{:name "insight", :type "STRING"},
{:name "risk_level", :type "STRING", :description "LOW, MEDIUM, or HIGH"}]
}
}
#+begin
Analyze the following product performance data and provide insights.
Flag products with high return rates as risks.
{{ print.Csv(TopProducts()) }}
#+end
#+src sql RiskReport()
#+begin
SELECT product, insight, risk_level
FROM {{ ProductInsights() }}
WHERE risk_level = 'HIGH'
#+end
Chaining LLM Blocks
LLM blocks cannot be used as table references from other LLM blocks (i.e., {{ llmBlock() }} as a relation is not allowed). However, you can pass data between LLM blocks using print.Csv() or iterator().
Using print.Csv() for Bulk Data
Use print.Csv() to embed another LLM block's entire result as CSV text in a prompt — this works the same way as embedding SQL data:
#+import "std/print"
#+src llm RawFindings()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "finding", :type "STRING"},
{:name "severity", :type "STRING"}]
}
}
#+begin
Generate 10 security audit findings for a web application.
#+end
#+src llm PrioritizedFindings()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "priority", :type "INT", :description "1=highest, 5=lowest"},
{:name "action", :type "STRING"}]
}
}
#+begin
Review the following security findings and assign a priority (1=highest to 5=lowest).
For each finding, recommend a specific remediation action.
{{ print.Csv(RawFindings()) }}
#+end
Using iterator() for Row-Level Access
Use iterator() when you need to access individual rows or specific columns:
#+src llm ExtractTopics()
#+meta {
:schema {
:columns [{:name "topic", :type "STRING"},
{:name "score", :type "FLOAT"}]
}
}
#+begin
Identify the top 5 trending data engineering topics for this quarter.
#+end
#+src llm GenerateArticleOutlines()
#+meta {
:schema {
:columns [{:name "topic", :type "STRING"},
{:name "outline", :type "STRING"}]
}
}
#+begin
Generate article outlines for the following topics:
#+for row : iterator(ExtractTopics()) separator "\n" do
- {{ row["topic"] }} (relevance score: {{ row["score"] }})
#+end
#+end
Access columns by name (row["topic"]) or by index (row[0]).
Parameterizing LLM Blocks
Block Arguments
LLM blocks accept arguments just like SQL blocks:
#+src llm GenerateTestData(table_name, row_count)
#+meta {
:schema {
:columns [{:name "sql", :type "STRING"}]
}
}
#+begin
Generate {{ row_count }} realistic INSERT statements for the {{ table_name }} table.
#+end
Script Parameters
Use $$ parameters for values passed at execution time:
#+src llm AnalyzeRegion()
#+meta {
:schema {
:columns [{:name "metric", :type "STRING"},
{:name "value", :type "FLOAT"}]
}
}
#+begin
Generate key performance metrics for the $$region region in $$year.
#+end
Using Constants
Define constants with #+const for reuse within the prompt body:
#+src llm BuildReport()
#+meta {
:schema {
:columns [{:name "report", :type "STRING"}]
}
}
#+const
departments = ["Sales", "Marketing", "Engineering"];
fiscal_year = 2025;
#+begin
Create a summary report for fiscal year {{ fiscal_year }}.
Departments to include:
#+for dept : departments separator ", " do{{ dept }}#+end.
#+end
Controlling Result Caching
By default, LLM block results are not cached. Enable caching when a block is referenced multiple times to avoid redundant LLM calls:
#+src llm SharedData()
#+meta {
:schema {
:columns [{:name "id", :type "INT"},
{:name "val", :type "STRING"}]
},
:cache_results true
}
#+begin
Generate shared reference data.
#+end
#+src sql CombinedView()
#+begin
-- Both references reuse the same cached result
SELECT id, val, 'source_a' AS origin FROM {{ SharedData() }}
UNION ALL
SELECT id, val, 'source_b' AS origin FROM {{ SharedData() }}
#+end
When caching is disabled (the default), each reference triggers a separate LLM call:
#+meta {
:schema { ... }
-- :cache_results defaults to false; no need to set it explicitly
}
Publishing LLM Results to Tables
LLM blocks support table publication with the same metadata as SQL blocks:
#+src llm GenerateMetrics()
#+meta {
:schema {
:columns [{:name "date_day", :type "DATE"},
{:name "metric", :type "STRING"},
{:name "value", :type "FLOAT"}]
},
:publication {
:type "table",
:name "llm_metrics",
:schema "analytics"
}
}
#+begin
Generate daily performance metrics for the past week.
#+end
Incremental Publication
Use append or merge strategies for incremental updates:
#+src llm DailySentiment()
#+meta {
:schema {
:columns [{:name "date_day", :type "DATE"},
{:name "sentiment", :type "FLOAT"},
{:name "summary", :type "STRING"}]
},
:publication {
:type "table",
:name "daily_sentiment",
:incremental "merge",
:unique_key ["date_day"]
}
}
#+begin
Analyze today's customer feedback and generate a sentiment score.
#+end
Running Publications Programmatically
#+import "std/publication"
{{ publication.Run(blocks=[GenerateMetrics, DailySentiment], parallelism=2) }}
View publication is not supported for LLM blocks. Use :type "table" only.
Common Patterns
Test Data Generation
#+src llm TestOrders()
#+meta {
:schema {
:columns [{:name "order_id", :type "INT"},
{:name "customer_id", :type "INT"},
{:name "product", :type "STRING"},
{:name "amount", :type "DECIMAL(10,2)"},
{:name "status", :type "STRING", :description "pending, shipped, or delivered"}]
}
}
#+begin
Generate 50 realistic e-commerce order records.
Ensure a realistic distribution of statuses: 20% pending, 30% shipped, 50% delivered.
#+end
Data Enrichment
#+import "std/print"
#+src sql RawAddresses()
#+begin
SELECT address_id, street_address, city, state FROM addresses WHERE country_code IS NULL LIMIT 100
#+end
#+src llm EnrichedAddresses()
#+meta {
:schema {
:columns [{:name "address_id", :type "INT"},
{:name "country_code", :type "STRING", :description "ISO 3166-1 alpha-2 country code"}]
}
}
#+begin
For each address below, determine the most likely country code (ISO 3166-1 alpha-2).
{{ print.Csv(RawAddresses()) }}
#+end
#+src sql UpdateAddresses()
#+begin
UPDATE addresses a
SET country_code = e.country_code
FROM {{ EnrichedAddresses() }} e
WHERE a.address_id = e.address_id
#+end
Conditional LLM Prompt Generation
#+src llm FlexibleAnalysis(detail_level)
#+meta {
:schema {
:columns [{:name "finding", :type "STRING"},
{:name "severity", :type "STRING"}]
}
}
#+const
is_detailed = detail_level == "high";
#+begin
Analyze system health metrics and generate findings.
#+if is_detailed then
Include root cause analysis and remediation steps for each finding.
Provide severity as CRITICAL, HIGH, MEDIUM, or LOW.
#+else
Provide a high-level summary only.
Provide severity as HIGH or LOW.
#+end
#+end
Limitations
- Platform support: SQL blocks can only reference LLM blocks on platforms that support creating tables from recordset data. Athena and SQL Server are not currently supported.
- No direct LLM-to-LLM table references:
{{ llmBlock() }}cannot be used as a relation inside another LLM block. Useprint.Csv()oriterator()instead. - No view publication: Only table publication is supported for LLM blocks.
- Schema required for SQL references: LLM blocks referenced by SQL blocks should have a
:schemadefined.
Next Steps
- CoginitiScript Reference: LLM Blocks — Full syntax reference
- Building an AI-Powered Data Pipeline Tutorial — End-to-end tutorial
- CoginitiScript: Generating SQL Dynamically — Dynamic SQL patterns that pair well with LLM blocks