CoginitiScript: Working with LLM Blocks

LLM blocks let you generate structured data from AI models directly inside CoginitiScript. This guide covers practical patterns for using LLM blocks in your data workflows — from generating test data to enriching datasets and building AI-powered pipelines.

What You'll Learn

How to define LLM blocks with output schemas
How to use LLM-generated data in SQL queries
How to pass context from SQL into LLM prompts
How to chain multiple LLM blocks together
How to publish LLM results to database tables
How to control caching behavior

Prerequisites

Basic understanding of CoginitiScript syntax (see CoginitiScript Reference)
Familiarity with SQL
AI Assistant configured in your Coginiti instance (see How to Configure AI Assistant)
A database connection that supports table creation from recordsets (e.g., PostgreSQL, Snowflake, BigQuery, Greenplum, Redshift)

Generating Structured Data with LLM Blocks

The Basics

An LLM block sends a prompt to the AI model and returns tabular data. Define the block with #+src llm, declare the expected output columns in :schema, and write your prompt in the body:

#+src llm SampleCustomers()
#+meta {
  :schema {
    :columns [{:name "id", :type "INT", :description "Customer ID"},
              {:name "name", :type "STRING", :description "Full name"},
              {:name "email", :type "STRING", :description "Email address"},
              {:name "tier", :type "STRING", :description "Bronze, Silver, or Gold"}]
  }
}
#+begin
Generate 10 realistic sample customer records for an e-commerce platform.
Include a mix of Bronze, Silver, and Gold tier customers.
#+end

Execute this block to see the AI-generated customer data as a result set.

note

LLM blocks use whichever AI provider and model is configured as your active AI Assistant — there is no per-block model setting, so every LLM block in a script uses that same provider.

Using Schema Descriptions for Better Results

The :description field in each column map is sent to the model as part of the column definition, steering the values it generates for that column. Use it to guide the LLM toward appropriate values:

#+meta {
  :schema {
    :columns [{:name "score", :type "DECIMAL(5,2)", :description "NPS score between -100.00 and 100.00"},
              {:name "date", :type "DATE", :description "Date in the last 30 days"}]
  }
}

Supported Data Types

LLM blocks support all standard types. Use parameterized types for precision:

:schema {
  :columns [{:name "id", :type "INT"},
            {:name "name", :type "STRING"},
            {:name "price", :type "DECIMAL(10,2)"},
            {:name "weight", :type "FLOAT"},
            {:name "active", :type "BOOL"},
            {:name "created", :type "DATE"},
            {:name "updated", :type "TIMESTAMP(0)"},
            {:name "check_at", :type "TIME(2)"},
            {:name "thumbnail", :type "BINARY"}]
}

Controlling Validation and Nullability

By default, columns are nullable and the AI response is validated leniently — values are converted to the declared types on a best-effort basis. Two optional schema keys give you tighter control:

:validation "strict" on the :schema map makes the model report ambiguous or unrepresentable values as an error instead of guessing, and fails (rather than coerces) values that don't match the declared type. Use it when correctness matters more than always getting a row.
:nullable false on a column requires a non-null value for that column in every row.

#+src llm StrictMetrics()
#+meta {
  :schema {
    :columns [{:name "metric", :type "STRING", :nullable false},
              {:name "value", :type "DECIMAL(10,2)", :nullable false}],
    :validation "strict"
  }
}
#+begin
Generate exact financial metrics. Do not approximate.
#+end

Reducing schema complexity on wide tables

Each nullable column is sent to the model as a union type (anyOf: [type, null]) in the request's JSON schema. Some providers cap how many union-typed fields a schema may contain — Anthropic, for example, rejects schemas with more than 16 union-typed parameters. On a wide table this can make the provider refuse the request outright. Marking columns that never need nulls as :nullable false emits them as a plain typed field (still required) instead of a union, lowering the union count and keeping the schema within provider limits.

Querying LLM Data with SQL

Basic SQL Reference

Reference an LLM block from SQL using the standard {{ }} syntax:

#+src llm inventory()
#+meta {
  :schema {
    :columns [{:name "product", :type "STRING"},
              {:name "quantity", :type "INT"},
              {:name "in_stock", :type "BOOL"}]
  }
}
#+begin
Generate warehouse inventory for 20 electronics products.
#+end

#+src sql LowStockReport()
#+begin
SELECT product, quantity
FROM {{ inventory() }}
WHERE in_stock = true AND quantity < 10
ORDER BY quantity ASC
#+end

The LLM block's output is materialized as a temporary table, allowing full SQL operations — filtering, aggregation, sorting, and joins.

Joining Multiple LLM Blocks

#+src llm customers()
#+meta {
  :schema {
    :columns [{:name "id", :type "INT"},
              {:name "name", :type "STRING"}]
  }
}
#+begin
Generate 5 customer records.
#+end

#+src llm orders()
#+meta {
  :schema {
    :columns [{:name "order_id", :type "INT"},
              {:name "customer_id", :type "INT"},
              {:name "amount", :type "DECIMAL(10,2)"}]
  }
}
#+begin
Generate 15 order records with customer_id values between 1 and 5.
#+end

#+src sql CustomerOrderSummary()
#+begin
SELECT c.name, COUNT(*) AS order_count, SUM(o.amount) AS total_spent
FROM {{ customers() }} c
JOIN {{ orders() }} o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC
#+end

Passing SQL Data into LLM Prompts

Use print.Csv() to embed SQL query results as CSV text in an LLM prompt. This lets the AI analyze your actual data:

#+import "std/print"

#+src sql TopProducts()
#+begin
SELECT product_name, total_sales, return_rate
FROM product_metrics
ORDER BY total_sales DESC
LIMIT 20
#+end

#+src llm ProductInsights()
#+meta {
  :schema {
    :columns [{:name "product", :type "STRING"},
              {:name "insight", :type "STRING"},
              {:name "risk_level", :type "STRING", :description "LOW, MEDIUM, or HIGH"}]
  }
}
#+begin
Analyze the following product performance data and provide insights.
Flag products with high return rates as risks.

{{ print.Csv(TopProducts()) }}
#+end

#+src sql RiskReport()
#+begin
SELECT product, insight, risk_level
FROM {{ ProductInsights() }}
WHERE risk_level = 'HIGH'
#+end

Chaining LLM Blocks

LLM blocks cannot be used as table references from other LLM blocks (i.e., {{ llmBlock() }} as a relation is not allowed). However, you can pass data between LLM blocks using print.Csv() or iterator().

Using `print.Csv()` for Bulk Data

Use print.Csv() to embed another LLM block's entire result as CSV text in a prompt — this works the same way as embedding SQL data:

#+import "std/print"

#+src llm RawFindings()
#+meta {
  :schema {
    :columns [{:name "id", :type "INT"},
              {:name "finding", :type "STRING"},
              {:name "severity", :type "STRING"}]
  }
}
#+begin
Generate 10 security audit findings for a web application.
#+end

#+src llm PrioritizedFindings()
#+meta {
  :schema {
    :columns [{:name "id", :type "INT"},
              {:name "priority", :type "INT", :description "1=highest, 5=lowest"},
              {:name "action", :type "STRING"}]
  }
}
#+begin
Review the following security findings and assign a priority (1=highest to 5=lowest).
For each finding, recommend a specific remediation action.

{{ print.Csv(RawFindings()) }}
#+end

Using `iterator()` for Row-Level Access

Use iterator() when you need to access individual rows or specific columns:

#+src llm ExtractTopics()
#+meta {
  :schema {
    :columns [{:name "topic", :type "STRING"},
              {:name "score", :type "FLOAT"}]
  }
}
#+begin
Identify the top 5 trending data engineering topics for this quarter.
#+end

#+src llm GenerateArticleOutlines()
#+meta {
  :schema {
    :columns [{:name "topic", :type "STRING"},
              {:name "outline", :type "STRING"}]
  }
}
#+begin
Generate article outlines for the following topics:
#+for row : iterator(ExtractTopics()) separator "\n" do
- {{ row["topic"] }} (relevance score: {{ row["score"] }})
#+end
#+end

Access columns by name (row["topic"]) or by index (row[0]).

Parameterizing LLM Blocks

Block Arguments

LLM blocks accept arguments just like SQL blocks:

#+src llm GenerateTestData(table_name, row_count)
#+meta {
  :schema {
    :columns [{:name "sql", :type "STRING"}]
  }
}
#+begin
Generate {{ row_count }} realistic INSERT statements for the {{ table_name }} table.
#+end

Script Parameters

Use $$ parameters for values passed at execution time:

#+src llm AnalyzeRegion()
#+meta {
  :schema {
    :columns [{:name "metric", :type "STRING"},
              {:name "value", :type "FLOAT"}]
  }
}
#+begin
Generate key performance metrics for the $$region region in $$year.
#+end

Using Constants

Define constants with #+const for reuse within the prompt body:

#+src llm BuildReport()
#+meta {
  :schema {
    :columns [{:name "report", :type "STRING"}]
  }
}
#+const
  departments = ["Sales", "Marketing", "Engineering"];
  fiscal_year = 2025;
#+begin
Create a summary report for fiscal year {{ fiscal_year }}.
Departments to include:
#+for dept : departments separator ", " do{{ dept }}#+end.
#+end

Controlling Result Caching

By default, LLM block results are cached: the block is called once per unique (block, arguments) combination within an execution scope, and the result is reused across all references in that scope. This avoids redundant LLM calls when a block is referenced multiple times:

#+src llm SharedData()
#+meta {
  :schema {
    :columns [{:name "id", :type "INT"},
              {:name "val", :type "STRING"}]
  }
}
#+begin
Generate shared reference data.
#+end

#+src sql CombinedView()
#+begin
-- Both references reuse the same cached result (a single LLM call)
SELECT id, val, 'source_a' AS origin FROM {{ SharedData() }}
UNION ALL
SELECT id, val, 'source_b' AS origin FROM {{ SharedData() }}
#+end

Set :cache_results false to force a fresh LLM call on every reference:

#+meta {
  :schema { ... },
  :cache_results false
}

Publishing LLM Results

LLM blocks support publication with the same metadata as SQL blocks — to a database table (below) or a CSV/Parquet file. Views are not supported:

#+src llm GenerateMetrics()
#+meta {
  :schema {
    :columns [{:name "date_day", :type "DATE"},
              {:name "metric", :type "STRING"},
              {:name "value", :type "FLOAT"}]
  },
  :publication {
    :type "table",
    :name "llm_metrics",
    :schema "analytics"
  }
}
#+begin
Generate daily performance metrics for the past week.
#+end

Incremental Publication

Use append or merge strategies for incremental updates:

#+src llm DailySentiment()
#+meta {
  :schema {
    :columns [{:name "date_day", :type "DATE"},
              {:name "sentiment", :type "FLOAT"},
              {:name "summary", :type "STRING"}]
  },
  :publication {
    :type "table",
    :name "daily_sentiment",
    :incremental "merge",
    :unique_key ["date_day"]
  }
}
#+begin
Analyze today's customer feedback and generate a sentiment score.
#+end

Running Publications Programmatically

#+import "std/publication"

{{ publication.Run(blocks=[GenerateMetrics, DailySentiment], parallelism=2) }}

warning

View publication (:type "view") is not supported for LLM blocks. Table, CSV, and Parquet publication are all supported; file publication (CSV/Parquet) follows the same rules as for SQL blocks and cannot be referenced from SQL.

Common Patterns

Test Data Generation

#+src llm TestOrders()
#+meta {
  :schema {
    :columns [{:name "order_id", :type "INT"},
              {:name "customer_id", :type "INT"},
              {:name "product", :type "STRING"},
              {:name "amount", :type "DECIMAL(10,2)"},
              {:name "status", :type "STRING", :description "pending, shipped, or delivered"}]
  }
}
#+begin
Generate 50 realistic e-commerce order records.
Ensure a realistic distribution of statuses: 20% pending, 30% shipped, 50% delivered.
#+end

Data Enrichment

#+import "std/print"

#+src sql RawAddresses()
#+begin
SELECT address_id, street_address, city, state FROM addresses WHERE country_code IS NULL LIMIT 100
#+end

#+src llm EnrichedAddresses()
#+meta {
  :schema {
    :columns [{:name "address_id", :type "INT"},
              {:name "country_code", :type "STRING", :description "ISO 3166-1 alpha-2 country code"}]
  }
}
#+begin
For each address below, determine the most likely country code (ISO 3166-1 alpha-2).

{{ print.Csv(RawAddresses()) }}
#+end

#+src sql UpdateAddresses()
#+begin
UPDATE addresses a
SET country_code = e.country_code
FROM {{ EnrichedAddresses() }} e
WHERE a.address_id = e.address_id
#+end

Conditional LLM Prompt Generation

#+src llm FlexibleAnalysis(detail_level)
#+meta {
  :schema {
    :columns [{:name "finding", :type "STRING"},
              {:name "severity", :type "STRING"}]
  }
}
#+const
  is_detailed = detail_level == "high";
#+begin
Analyze system health metrics and generate findings.
#+if is_detailed then
Include root cause analysis and remediation steps for each finding.
Provide severity as CRITICAL, HIGH, MEDIUM, or LOW.
#+else
Provide a high-level summary only.
Provide severity as HIGH or LOW.
#+end
#+end

Handling Errors

An LLM block fails with a script error when the response can't be reconciled with its schema. A few behaviors are worth knowing:

Automatic retry. If the model returns a malformed response or a value that can't be converted to the declared type, Coginiti re-issues the call (up to 3 total attempts), feeding the error back so the model can self-correct. You don't need to handle transient formatting glitches yourself.
Model refusals are final. If the model decides a value can't be produced faithfully, it returns a short refusal instead of rows. This surfaces as an error and is not retried — adjust the prompt or schema. :validation "strict" makes refusals more likely (by design); "lenient" coerces where it can.
Validate the output. AI output varies between runs. Add data quality tests to catch values that pass schema validation but are still wrong.

For the full list of error messages and fixes, see the LLM block error reference.

Limitations

Platform support: SQL blocks can only reference LLM blocks on platforms that support creating tables from recordset data. Athena and SQL Server are not currently supported.
No direct LLM-to-LLM table references: {{ llmBlock() }} cannot be used as a relation inside another LLM block. Use print.Csv() or iterator() instead.
No view publication: Table, CSV, and Parquet publication are supported, but view publication is not.
Schema required for SQL references: LLM blocks referenced by SQL blocks should have a :schema defined.

Next Steps

CoginitiScript Reference: LLM Blocks — Full syntax reference
Building an AI-Powered Data Pipeline Tutorial — End-to-end tutorial
CoginitiScript: Generating SQL Dynamically — Dynamic SQL patterns that pair well with LLM blocks

What You'll Learn​

Prerequisites​

Generating Structured Data with LLM Blocks​

The Basics​

Using Schema Descriptions for Better Results​

Supported Data Types​

Controlling Validation and Nullability​

Querying LLM Data with SQL​

Basic SQL Reference​

Joining Multiple LLM Blocks​

Passing SQL Data into LLM Prompts​

Chaining LLM Blocks​

Using print.Csv() for Bulk Data​

Using iterator() for Row-Level Access​

Parameterizing LLM Blocks​

Block Arguments​

Script Parameters​

Using Constants​

Controlling Result Caching​

Publishing LLM Results​

Incremental Publication​

Running Publications Programmatically​

Common Patterns​

Test Data Generation​

Data Enrichment​

Conditional LLM Prompt Generation​

Handling Errors​

Limitations​

Next Steps​